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Preface 



This volume contains the papers presented at the 1999 International Con- 
ference on Principles and Practice of Declarative Programming (PPDP’99) held 
in Paris from September 29 through October 1, 1999. PPDP’99 participated, 
together with the International Conference on Functional Programming (ICFP) 
and several related workshops, in a federation of colloquia known as Principles, 
Logics and Implementations of high-level programming languages (PLF99). The 
overall event was organized by the Institut National de Recherche en Infor- 
matique et en Automatique (INRIA) and the ACM Special Interest Group for 
Programming Languages (ACM/SIGPLAN). 

PPDP represents the union of two conferences that had been in existence 
for about a decade: Programming Languages, Implementations, Logics and Pro- 
grams (PLILP) and Algebraic and Logic Programming (ALP). These conferences 
were held as one for the first time under the name PLILP /ALP in their tenth and 
seventh respective incarnations last year. The present rendition follows a decision 
by the combined steering committees to adopt a simpler name for the conference 
that also reflected the union. Continuing the tradition of PLILP/ ALP, PPDP 
aims to stimulate research in the use of declarative methods in programming 
and on the design, application, and implementation of programming languages 
that support such methods. Topics of interest include the use of type theory, 
logics, and logical methods in understanding, defining, integrating, and extend- 
ing programming paradigms such as those for functional, logic, object-oriented, 
constraint, and concurrent programming; support for modularity; the use of log- 
ics in the design of program development tools; development of implementation 
methods; and the application of the relevant paradigms and associated methods 
in industry and education. Many of these themes are reflected in the papers 
appearing in the present collection. Of particular note in these proceedings is 
the broad interpretation of declarative programming and the emphasis on both 
principles and practice in this area of research. 

A few words about the selection of papers. Fifty-one full-length papers were 
received in response to the call for submissions. Each of these papers was re- 
viewed by at least four individuals. The program committee met electronically 
in the last two weeks of April 1999 and, based on the reviews, selected 22 pa- 
pers for presentation at the conference. A decision was also made during this 
meeting to include invited talks by Georges Gonthier (INRIA-Rocquencourt, 
France), Simon Peyton Jones (Microsoft Research, UK) and Pascal van Henten- 
ryck (Catholic University of Louvain, Belgium), and tutorials by Chris Okasaki 
(Columbia University, USA) and Frank Pfenning (Carnegie Mellon University, 
USA) in the scientific program. These proceedings include all 22 contributed 
papers that were accepted, revised in accordance with the suggestions of the 
reviewers. Also included are papers that complement the presentations of Simon 
Peyton Jones and Pascal van Hentenryck and an abstract of the tutorial by 




VI 



Preface 



Frank Pfenning. Papers accompanying the remaining invited talk and tutorial 
were not received by the time of going to press. 

Many people and institutions are to be acknowledged for their contributions 
to PPDP’99. The organization of this conference and PLP99 would not have 
been possible but for the efforts of Frangois Fages and Didier Remy, the chairs 
of PPDP’99 and ICFP’99, and Annick Theis-Viemont and the INRIA staff. The 
quality of the technical program owes much to the diligence of the program com- 
mittee members and the several referees whose help they enlisted. In addition to 
providing careful reviews of submitted papers, many of these individuals partic- 
ipated in extended discussions at the PC meeting towards ensuring consistency 
and accuracy in the selection process. At a financial level, PPDP’99 benefitted 
from a grant from the European Commission program for Training and Mobility 
of Researchers; this grant was mediated by the European Association for Pro- 
gramming Languages and Systems (EAPLS). Additional financial support was 
provided by the Centre National de la Recherche Scientifique (CNRS), Compu- 
logNet, Microsoft Research, Ministere de I’Education Nationale, de la Recherche 
et de la Technologie (Gouv. France), Trusted Logic, and France Telecom. Finally, 
the meeting received an endorsement from the Association for Logic Program- 
ming. 
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Abstract. For a compiler writer, generating good machine code for a 
variety of platforms is hard work. One might try to reuse a retargetable 
code generator, but code generators are complex and difficult to use, 
and they limit one’s choice of implementation language. One might try 
to use C as a portable assembly language, but C limits the compiler 
writer’s flexibility and the performance of the resulting code. The wide 
use of C, despite these drawbacks, argues for a portable assembly lan- 
guage. C — is a new language designed expressly for this purpose. The 
use of a portable assembly language introduces new problems in the sup- 
port of such high-level run-time serviees as garbage collection, exception 
handling, concurrency, profiling, and debugging. We address these prob- 
lems by combining the C — language with a C — run-time interface. The 
combination is designed to allow the compiler writer a choice of source- 
language semantics and implementation techniques, while still providing 
good performance. 



1 Introduction 

Suppose you are writing a compiler for a high-level language. How are you to 
generate high-quality machine code? You could do it yourself, or you could try 
to take advantage of the work of others by using an off-the-shelf code gener- 
ator. Curiously, despite the huge amount of research in this area, only three 
retargetable, optimizing code generators appear to be freely available: VPO [6], 
ML-RISC [16], and the gcc back end [33]. Each of these impressive systems has a 
rich, complex, and ill-documented interface. Of course, these interfaces are quite 
different from one another, so once you start to use one, you will be unable to 
switch easily to another. Furthermore, they are language-specific. To use ML- 
RISC you must write your front end in ML, to use the gcc back end you must 
write it in C, and so on. 

All of this is most unsatisfactory. It would be much better to have one 
portable assembly language that could be generated by a front end and imple- 
mented by any of the available code generators. So pressing is this need that it has 
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become common to use C as a portable assembly language [2,5,25,37,18,22,30]. 
Unfortunately, C was never intended for this purpose — it is a programming lan- 
guage, not an assembly language. C locks the implementation into a particular 
calling convention, makes it impossible to compute targets of jumps, provides no 
support for garbage collection, and provides very little support for exceptions or 
debugging (Section 2). 

The obvious way forward is to design a language specifically as a compiler 
target language. Such a language should serve as the interface between a com- 
piler for a high-level language (the front end) and a retargetable code generator 
(the hack end). The language would not only make the compiler writer’s life 
much easier, but would also give the author of a new code generator a ready- 
made customer base. In an earlier paper we propose a design for just such a 
language, C — [24], but the story does not end there. Separating the front and 
back ends greatly complicates run-time support. In general, the front end, back 
end, and run-time system for a programming language are designed together. 
They cooperate intimately to support such high-level features as garbage col- 
lection, exception handling, debugging, profiling, and concurrency — high-level 
run-time services. If the back end is a portable assembler like C — , we want the 
cooperation without the intimacy; an implementation of C — should be indepen- 
dent of the front ends with which it will be used. 

One alternative is to make all these high-level services part of the abstraction 
offered by the portable assembler. For example, the Java Virtual Machine, which 
provides garbage collection and exception handling, has been used as a target 
for languages other than Java, including Ada [36], ML [7], Scheme [12], and 
Haskell [39]. But a sophisticated platform like a virtual machine embodies too 
many design decisions. For a start, the semantics of the virtual machine may 
not match the semantics of the language being compiled (e.g., the exception 
semantics). Even if the semantics happen to match, the engineering tradeoffs 
may differ dramatically. For example, functional languages like Haskell or Scheme 
allocate like crazy [14], and JVM implementations are typically not optimised 
for this case. Finally, a virtual machine typically comes complete with a very 
large infrastructure — class loaders, verifiers and the like — that may well be 
inappropriate. Our intended level of abstraction is much, much lower. 

Our problem is to enable a client to implement high-level services, while 
still using C — as a code generator. As we discuss in Section 4, supporting high- 
level services requires knowledge from both the front and back ends. The insight 
behind our solution is that C — should include not only a low-level assembly 
language, for use by the compiler, but also a low-level run-time system, for use 
by the front end’s run-time system. The only intimate cooperation required is 
between the C — back end and its run-time system; the front end works with 
C — at arm’s length, through a well-defined language and a well-defined run- 
time interface (Section 5). This interface adds something fundamentally new: 
the ability to inspect and modify the state of a suspended computation. 

It is not obvious that this approach is workable. Can just a few assembly- 
language capabilities support many high-level run-time services? Can the front- 
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end run-time system easily implement high-level services using these capabilities? 
How much is overall efficiency compromised by the arms-length relationship be- 
tween the front-end runtime and the C — runtime? We cannot yet answer these 
questions definitively. Instead, the primary contributions of this paper are to 
identify needs that are common to various high-level services, and to propose 
specific mechanisms to meet these needs. We demonstrate only how to use C — 
to implement the easiest of our intended services, namely garbage collection. 
Refining our design to accommodate exceptions, concurrency, profiling, and de- 
bugging has emerged as an interesting research challenge. 



2 It’s Impossible — or it’s C 

The dream of a portable assembler has been around at least since UNCOL [13]. 
Is it an impossible dream, then? Clearly not: C’s popularity as an assembler is 
clear evidence that a need exists, and that something useful can be done. 

If C is so popular, then perhaps C is perfectly adequate? Not so. There are 
many difficulties, of which the most fundamental are these: 

— The C route rewards those who can map their high-level language rather 
directly onto C. A high-level language procedure becomes a C procedure, 
and so on. But this mapping is often awkward, and sometimes impossible. 
For example, some source languages fundamentally require tail-call optimi- 
sation', a procedure call whose result is returned to the caller of the current 
procedure must be executed in the stack frame of the current procedure. This 
optimisation allows iteration to be implemented efficiently using recursion. 
More generally, it allows one to think of a procedure as a labelled extended 
basic block that can be jumped to, rather than as sub-program that can only 
be called. Such procedures give a front end the freedom to design its own 
control flow. 

It is very difficult to implement the tail-call optimisation in C, and no C com- 
piler known to us does so across separately compiled modules. Those using C 
have been very ingenious in finding ways around this deficiency [34,37,25,18], 
but the results are complex, fragile, and heavily tuned for one particular im- 
plementation of C (usually gcc). 

— AC compiler may lay out its stack frames as it pleases. This makes it difficult 
for a garbage collector to find the live pointers. Implementors either arrange 
not to keep pointers on the C stack, or they use a conservative garbage 
collector. These restrictions are Draconian. 

— The unknown stack-frame layout also complicates support for exception 
handling, debugging, profiling, and concurrency. For example, an exception- 
handling mechanism needs to walk the stack, perhaps removing stack frames 
as it goes. Again, C makes it essentially impossible to implement such mecha- 
nisms, unless they can be closely mapped onto what C provides (i.e., setjmp 
and longjmp). 
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— AC compiler has to be very conservative about the possibility of memory 
aliasing. This seriously limits the ability of the instruction scheduler to per- 
mute memory operations or hoist them out of a loop. The front-end compiler 
often knows that aliasing cannot occur, but there is no way to convey this 
information to the C compiler. 

So much for fundamental issues. C also lacks the ability to control a number 
of important low-level features, including returning multiple values in registers 
from a procedure, mis-aligned memory accesses, arithmetic, data layout, and 
omitting range checks on multi-way jumps. 

In short, C is awkward to use as a portable assembler, and many of these 
difficulties translate into performance hits. A portable assembly language should 
be able to offer better performance, as well as greater ease of use. 



3 An Overview of C — 



/* Ordinary recursion */ 


/* Loops */ 


export spl; 


export sp3; 


spK bits32 n ) { 


sp3( bits32 n ) { 


bits32 s, p; 


bits32 s, p; 


if n == 1 { 


s = 1 ; p = 1 ; 


return ( 1, 1 ); 




} else { 


loop: 


s , p = spl ( n-1 ) ; 


if n==l -[ 


return ( s+n, p*n ) ; 


return ( s, p ); 


} 


} else { 


} 


s = s+n; 




p = p*n; 


/* Tail recursion */ 


n = n-1; 


export sp2; 


goto loop; 


sp2( bits32 n ) { 


} 


jump sp2_help( n, 1, 1 ); 


} 


} 




sp2_help( bits32 n, bits32 s, bits32 p ) { 




if n==l { 




return ( s, p ); 




} else { 




jump sp2_help( n-1, s+n, p*n ) 




} 




} 





Fig. 1. Three functions that compute the sum X)i=i * product rii=i writ- 
ten in C — . 
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In this section we give an overview of the design of C — . Fuller descriptions 
can be found in [24] and in [28]. Figure 1 gives examples of some C — procedures 
that give a flavour of the language. Despite its name C — is by no means a subset 
of C, as will become apparent; C was simply our jumping-off point. 

3.1 What is a Portable Assember? 

C — is an assembly language — an abstraction of hardware — not a high-level 
programming language. Hardware provides computation, control flow, memory, 
and registers; C — provides corresponding abstractions. 

— C — expressions and assignments are abstractions of computation. C — pro- 
vides a rich set of computational operators, but these operators work only 
on machine-level data types: bytes, words, etc. The expression abstraction 
hides the particular combination of machine instructions needed to com- 
pute values, and it hides the machine registers that may be needed to hold 
intermediate results. 

— C — ’s goto and if statements are abstractions of control flow. (For conve- 
nience, C — also provides structured control-flow constructs.) The if abstrac- 
tion hides the machine’s “condition codes;” branch conditions are arbitrary 
Boolean expressions. 

— C — treats memory much as the machine does, except that addresses used 
in C — programs may be arbitrary expressions. This abstraction hides the 
limitations of the machine’s addressing modes. 

— C — variables are an abstraction of registers. A C — back end puts as many 
variables as possible in registers; others go in memory. This abstraction hides 
the number and conventional uses of the machine’s registers. 

— In addition, C — provides a procedure abstraction, the feature that looks least 
like an abstraction of a hardware primitive. However, many processor archi- 
tectures provide direct support for procedures, although the nature of that 
support varies widely (procedure call or multiple register save instructions, 
register windows, link registers, branch prediction for return instructions, 
and so on). Because of this variety, calling conventions and activation-stack 
management are notoriously architecture dependent and hard to specify. C — 
therefore offers prodedures as a primitive abstraction, albeit in a slightly un- 
usual form (Section 3.4). 

Our goal is to make it easy to retarget front ends, not to make every C — 
program runnable everywhere. Although every C — program has a well-defined 
semantics that is independent of any machine, a front end translating a sin- 
gle source program might need to generate two different C — programs for two 
different target architectures. For example, a C — program generated for a ma- 
chine without floating-point instructions would be different from a C — program 
generated for a machine without floating-point instructions. 

Our goal contrasts sharply with “write once; run anywhere,” the goal of 
such distribution formats as Java class flies. Juice flies [15], and ANDF or Ten- 
DRA [20]. These formats are abstractions of high-level languages, not of un- 
derlying machines. Their purpose is binary portability, and they retain enough 
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high-level language semantics to permit effective compilation at the remote in- 
stallation site. 

Even though C — exposes a few architecture-specific details, like word size, 
the whole point is to hide those details, so that the front end job can largely 
independent of the target architecture. A good C — implementation therefore 
must do substantial architecture-dependent work. For example: 



— Register allocation. 

— Instruction selection, exploiting complex instructions and addressing modes. 

— Instruction scheduling. 

— Stack- frame layout. 

— Classic back-end optimisations such as common-subexpression elimination 
and copy propagation. 

— If-conversion for predicated architectures. 

Given these requirements, C — resembles a typical compiler’s intermediate lan- 
guage more than a typical machine’s assembly language. 



3.2 Types 

C — supports a bare minimum of data types: a family of bits types (bitsS, 
bitsie, bits32, bits64), and a family of floating-point types (f loat32, f loat64, 
floatSO). These types encode only the size (in bits) and the kind of register 
(general-purpose or floating-point) required for the datum. 

Not all types are available on all machines; for example, a C — program 
emitted by a front-end compiler for a 64-bit machine might be rejected if fed to 
a C — implementation for a 32-bit machine. It is easy to tell a front end how big 
to make its data types, and doing so makes the front end’s job easier in some 
ways; for example, it can compute offsets statically. 

The bits types are used for characters, bit vectors, integers, and addresses 
(pointers). On each architecture, a bits type is designated the ^‘native word 
type” of the machine. A “native code-pointer type” and “native data-pointer 
type” are also designated; exported and imported names must have one of these 
pointer types. On many machines, all three types are the same, e.g, bits32. 



3.3 Static Allocation 

C — offers detailed control of static memory layout, much as ordinary assem- 
blers do. A data block consists of a sequence of labels, initialised data values, 
uninitialised arrays, and alignment directives. For example: 
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data { 
f 00 : 



bazl : 
baz2 : 
end: 



bits32{10} ; 
bits32{l,2,3,4>; 
bits32 [8] ; 



/* One bits32 initialised to 10 */ 

/* Four initialised bits32’s */ 

/* Uninitialised array of 8 bits32’s */ 



bits8 



/* An uninitialised byte */ 



} 



Here foo is the address of the first bits32, bazl and baz2 are both the 
address of the bits8, and end is the address of the byte after the bits8. The 
labels foo, bazl, etc, should be thought of as addresses, not as memory locations. 
They are all immutable constants of the native data-pointer type; they cannot 
be assigned to. 

How, then, can one access the memory at location foo? Memory accesses 
(loads and stores) are typed, and denoted with square brackets. Thus the state- 
ment: 

bits32[foo] = bits32[foo] + 1; 

loads a bits32 from the location whose address is in foo, adds one to it, and 
stores it at the same location. The mnemonic for this syntax is to think of bits32 
as a C-like array representing all of memory, and bits32[foo] as a particular 
element of that array. The semantics of the address is not C-like, however; the 
expression in brackets is the byte address of the item. Further, foo’s type is 
always the native data-pointer type; the type of value stored at foo is specified 
by the load or store operation itself. So this is perfectly legal: 

bits8[foo+2] = bits8[foo+2] - 1; 

This statement modifies only the byte at address foo+2. 

Unlike C, C — has no implicit alignment or padding. Therefore, the address 
relationships between the data items within a single data block are machine- 
independent; for example, bazl = foo -1-52. An explicit align directive provides 
alignment where that is required. 

C — supports multiple, named data sections. For example: 

data "debug" { 

} 

This syntax declares the block of data to belong to the section named "debug". 
Code is by default placed in the section "text", and a data directive with no 
explicit section name defaults to the section "data". Procedures can be enclosed 
in code "mytext" { ... } to place them in a named section "mytext". 

C — expects that, when linking object files, the linker concatenates sections 
with the same name. (For backwards compatibility with some existing linkers, 
front ends may wish to emit an alignment directive at the beginning of each 
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C — section.) C — assigns no other semantics to the names of data sections, 
but particular implementations may assign machine-dependent semantics. For 
example, a MIPS implementation might assume that data in sections named 
"rodata" is read-only. 

3.4 Procedures 

C — supports procedures that are both more and less general than C procedures 
— for example, C — procedures offer multiple results and full tail calls, but they 
have a fixed number of arguments. Specifically: 

— A C — procedure, such as spl in Figure 1, has parameters, such as n, and local 
variables, such as s and p. Parameters and variables are mapped onto ma- 
chine registers where possible, and only spilled to the stack when necessary. 
In this absolutely conventional way C — abstracts away from the number of 
machine registers actually available. As with registers, C — provides no way 
to take the address of a parameter or local variable. 

— C — supports fully general tail calls, identified as “jumps”. Control does not 
return from jumps, and C — implementations must deallocate the caller’s 
stack frame before each jump. For example, the procedure sp2_help in Fig- 
ure 1 uses a jump to implement tail recursion. 

— C — supports procedures with multiple results, just as it supports procedures 
with multiple arguments. Indeed, a return is somewhat like a jump to a 
procedure whose address happens to be held in the topmost activation record 
on the control stack, rather than being specified explicitly. All the procedures 
in Figure 1 return two results; procedure spl contains a call site for such a 
procedure. 

— A C — procedure call is always a complete statement, which passes expres- 
sions as parameters and assigns results to local variables. Although high-level 
languages allow a call to occur in an expression, C — forbids it. For example, 
it is illegal to write 

r = f( g(x) ); /* illegal */ 

because the result returned by g(x) cannot be an argument to f. Instead, 
one must write two separate calls: 

y = g(x) ; 

r = f(y); 

This restriction makes explicit the order of evaluation, the location of each 
call site, and the names and types of temporaries used to hold the results 
of calls. (For similar reasons, assignments in C — are statements, not expres- 
sions, and C — operators have no side effects. In particular, C — provides no 
analog of C’s “p++.”) 

— To handle high-level variables that can’t be represented using C — ’s primitive 
types, C — can be asked to allocate named areas in the procedure’s activation 
record. 
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f (bits32 x) { 

bits32 y; 

stack { p : bits32; 

q : bits32[40]; 

> 

/ * Here , p and q are the addresses of the relevant chunks 
of data. Their type is the native data-pointer type. */ 

} 

stack is rather like data; it has the same syntax between the braces, but it 
allocates on the stack. As with data, the names are bound to the addresses 
of the relevant locations, and they are immutable. C — makes no provision 
for dynamically-sized stack allocation (yet). 

— The name of a procedure is a C — expression of native code-pointer type. The 
procedure specified in a call statement can be an arbitrary expression, not 
simply the statically-visible name of a procedure. For example, the following 
statements are both valid, assuming the procedure spl is defined in this 
compilation unit, or imported from another one: 

bits32[ptr] = spl; /* Store procedure address */ 

r,s = (bits32 [ptr] ) ( 4 ); /* Call stored procedure */ 

— A C — procedure, like sp3 in Figure 1, may contain gotos and labels, but they 
serve only to allow a textual representation of the control-flow graph. Unlike 
procedure names, labels are not values, and they have no representation 
at run time. Because this restriction makes it impossible for front ends to 
build jump tables from labels, C — includes a switch statement, for which the 
C — back end generates efficient code. The most efficient mix of conditional 
branches and indexed branches may depend on the architecture [8]. 

Jump tables of procedure addresses (rather than labels) can be built, of 
course, and a C — procedure can use the jump statement to make a tail call 
to a computed address. 

3.5 Calling Conventions 

The calling convention for C — procedures is entirely a matter for the C — imple- 
mentation — we call it the standard C — calling convention. In particular, C — 
need not use the C calling convention. 

The standard calling convention places no restrictions on the number of ar- 
guments passed to a function or the number of results returned from a func- 
tion. The only restrictions are that the number and types of actual parameters 
must match those in the procedure declaration, and similarly, that the number 
and types of values returned must match those expected at the call site. These 
restrictions enable efficient calling sequences with no dynamic checks. (A C — 
implementation need not check that C — programs meet these restrictions.) 

We note the following additional points: 
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— If a C — function does not “escape” — if all sites where it is called can 
be identified statically — then the C — back end is free to create and use 
specialised instances, with specialised calling conventions, for each call site. 
Escape analysis is necessarily conservative, but a function may be deemed 
to escape only if its name is used other than in a call, or if it is named in 
an export directive. 

— Support for unrestricted tail calls requires an unusual calling convention, so 
that a procedure making a tail call can deallocate its activation record while 
still leaving room for parameters that do not fit in registers. 

— C — allows the programmer to specify a particular calling convention (chosen 
from a small set of standard conventions) for an individual procedure, so that 
C — code can interoperate with foreign code. For example, even though C — ’s 
standard calling convention may differ from C’s, one can ask for a particular 
procedure to use C’s convention, so that the procedure can be called from 
an external C program. Similarly, external C procedures can be called from 
a C — procedure by specifying the calling convention at the call site. 

Some C — implementations may provide two versions of C’s calling conven- 
tion. The lightweight version would be like an ordinary C call, but it would be 
useful only when the C procedures terminate quickly; if control were trans- 
ferred to the run-time system while a C procedure was active, the run-time 
system might not be able to find values that were in callee-saves registers 
at the time of the call. The heavyweight version would keep all its state on 
the stack, not in callee-saves registers, so the run-time system could handle 
a stack containing a mix of C and C — activations. 

3.6 Miscellaneous 

Like other assemblers, C — gives programmers the ability to name compile-time 
constants, e.g., by 

const GC = 2; 

C — variables may be declared global, in which case the C — compiler at- 
tempts to put them in registers. For example, given the declaration 

global { 
bits32 hp; 

} 

the implementation attempts to put variable hp in a register, but if no register 
is available, it puts hp in memory. C — programs use and assign to hp without 
knowing whether it is in a register or in memory. Unlike storage allocated by 
data, there is no such thing as “the address of a global” , so memory stores to 
unknown addresses cannot affect the value of a global. This permits a global to be 
held in a register and, even if it has to be held in memory, the optimiser does not 
need to worry about re-loading it after a store to an unknown memory address. 
All separately compiled modules must have identical global declarations, or 
horribly strange things will happen. 
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global declarations may name specific (implementation-dependent) regis- 
ters, for example: 

global { 

bits32 hp "7,ebx"; 
bits32 hplim "7,esi"; 

} 

We remarked in Section 2 that the front end may know a great deal about (lack 
of) aliasing between memory access operations. We do not yet have a way to 
express such knowledge in C — , but an adaptation of [21] looks promising. 

4 The Problem of Run-Time Support 

When a front end and back end are written together, as part of a single compiler, 
they can cooperate intimately to support high-level run-time services, such as 
garbage collection, exception handling, profiling, concurrency, and debugging. In 
the C — framework, the front and back ends work at arm’s length. As mentioned 
earlier, our guiding principle is this: 

C — should make it possible to implement high-level run-time services, 
but it should not actually implement any of them. Rather, it should 
provide just enough “hooks” to allow the front-end run-time system to 
implement them. 

Separating policy from mechanism in this way is easier said than done. It might 
appear more palatable to incorporate garbage collection, exception handling, 
and debugging into the C — language, as (say) the Java Virtual Machine does. 
But doing so would guarantee that C — would never be used. Different source 
languages require different support, different object layouts, and different excep- 
tion semantics — especially when performance matters. No one back end could 
satisfy all customers. 

Why is the separation between front and back end hard to achieve? High-level 
run-time services need to inspect and modify the state of a suspended program. A 
garbage collector must find, and perhaps modify, all live pointers. An exception 
handler must navigate, and perhaps unwind, the call stack. A profiler must 
correlate object-code locations with source-code locations, and possibly navigate 
the call stack. A debugger must allow the user to inspect, and perhaps modify, 
the values of variables. All of these tasks require information from both front 
and back ends. The rest of this section elaborates. 

Finding roots for garbage collection. If the high-level language requires ac- 
curate garbage collection, then the garbage collector must be able to find all 
the roots that point into the heap. If, furthermore, the collector supports 
compaction, the locations of heap objects may change during garbage col- 
lection, and the collector must be able to redirect each root to point to the 
new location of the corresponding heap object. 
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The difficulty is that neither the front end nor the back end has all the knowl- 
edge needed to find roots at run time. Only the front end knows which source- 
language variables, and therefore which C — variables, represent pointers into 
the heap. Only the back end, which maps variables to registers and stack 
slots, knows where those variables are located at run time. Even the back 
end can’t always identify exact locations; variables mapped to callee-saves 
registers may be saved arbitrarily far away in the call stack, at locations not 
identifiable until run time. 

Printing values in a debugger. A debugger needs compiler support to print 
the values of variables. For this task, information is divided in much the 
same way as for garbage collection. Only the front end knows how source- 
language variables are mapped onto (collections of) C — variables. Only the 
front end knows how to print the value of a variable, e.g., as determined by 
the variable’s high- level-language type. Only the back end knows where to 
find the values of the C — variables. 

Loci of control A debugger must be able to identify the “locus of control” in 
each activation, and to associate that locus with a source-code location. This 
association is used both to plant breakpoints and to report the source-code 
location when a program faults. 

An exception mechanism also needs to identify the locus of control, because 
in some high-level languages, that locus determines which handler should 
receive the exception. When it identifies a handler, the exception mechanism 
unwinds the stack and changes the locus of control to refer to the handler. 
A profiler must map loci of control into entities that are profiled: procedures, 
statements, source-code regions, etc. 

At run time, loci of control are represented by values of the program counter 
(e.g., return addresses), but at the source level, loci of control are associated 
with statements in a high-level language or in C — . Only the front end knows 
how to associate high-level source locations or exception-handler scopes with 
C — statements. Only the back end knows how to associate C — statements 
with the program counter. 

Liveness. Depending on the semantics of the original source language, the locus 
of control may determine which variables of the high-level language are vis- 
ible. Depending on the optimizations performed by the back end, the locus 
of control may determine which C — variables are live, and therefore have 
values. Debuggers should not print dead variables. Garbage collectors should 
not trace them; tracing dead pointers could cause space leaks. Worse, trac- 
ing a register that once held a root but now holds a non-pointer value could 
violate the collector’s invariants. Again, only the front end knows which vari- 
ables are interesting for debugging or garbage collection, but only the back 
end knows which are live at a given locus of control. 

Exception values. In addition to unwinding the stack and changing the locus 
of control, the exception mechanism may have to communicate a value to an 
exception handler. Only the front end knows which variable should receive 
this value, but only the back end knows where variables are located. 
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Succinctly stated, each of these operations must combine two kinds of infor- 
mation: 

~ Information that only the front end has: 

Which C — parameters and local variables are heap pointers. 

How to map source-language variables to C — variables and how to as- 
sociate source-code locations with C — statements. 

Which exception handlers are in scope at which C — statements, and 
which variables are visible at which C — statements. 

— Information that only the back end has: 

Whether each C — local variable and parameter is live, where it is lo- 
cated (if live), and how this information changes as the program counter 
changes. 

Which program-counter values correspond to which C — statements. 
How to find activations of all active procedures and how to unwind 
stacks. 

5 Support for High-Level Run-Time Services 

The main challenge, then, is arranging for the back end and front end to share 
information, without having to implement them as a single integrated unit. In 
this section we describe a framework that allows this to be done. We focus on 
garbage collection as our illustrative example. Other high-level run-time services 
can fit in the same framework, but each requires service-specific extensions; we 
sketch some ideas in Section 7. 

In what follows, we use the term “variable” to mean either a parameter of 
the procedure or a locally-declared variable. 

5.1 The Framework 

We assume that executable programs are divided into three parts, each of which 
may be found in object files, libraries, or a combination. 

— The front end compiler translates the high-level source program into one or 
more C — modules, which are separately translated to generated object code 
by the C — compiler. 

~ The front end comes with a (probably large) front-end run-time system. 
This run-time system includes the garbage collector, exception handler, and 
whatever else the source language needs. It is written in a programming 
language designed for humans, not in C — ; in what follows we assume that 
the front end run-time system is written in C. 

— Every C — implementation comes with a (hopefully small) C — run-time sys- 
tem. The main goal of this run-time system is to maintain and provide access 
to information that only the back end can know. It makes this information 
available to the front end run-time system through a C-language run-time 
interface, which we describe in Section 5.2. Different front ends may inter- 
operate with the same C — run-time system. 
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To make an executable program, we link generated object code with both run- 
time systems. 

In outline, C — can support high-level run-time services, such as garbage col- 
lection, as follows. When garbage collection is required, control is transferred to 
the front-end run-time system (Section 6.1). The garbage collector then walks 
the C — stack, by calling access routines provided by the C — run-time system 
(Section 5.2). In each activation record on the C — stack, the garbage collector 
finds the location of each live variable, using further procedures provided by 
the C — runtime. However, the C — runtime cannot know which of these vari- 
ables holds a pointer. To answer this question, the front-end compiler builds a 
statically-allocated data block that identifies pointer variables, and it uses a span 
directive (Section 5.3) to associate this data block with the corresponding pro- 
cedure’s range of program counter values. The garbage collector combines these 
two sources of information to decide whether to treat the procedure’s variable 
as a root. Section 5.4 describes one possible garbage collector in more detail. 

5.2 The C — Run-Time Interface 

This section presents the core run-time interface provided by the C — run-time 
system. Using this interface, a front-end run-time system can inspect and modify 
the state of a suspended C — computation. Rather than specify representations of 
a suspended computation or its activation records, we hide them behind simple 
abstractions. These abstractions are presented to the front-end run-time system 
through a set of C procedures. 

The state of a C — computation consists of some saved registers and a logical 
stack of procedure activations. This logical stack is usually implemented as some 
sort of physical stack, but the correspondence between the two may not be very 
direct. Notably, callee-saves registers that logically belong with one activation are 
not necessarily stored with that activation, or even with the adjacent activation; 
they may be stored in the physical record of an activation that is arbitrarily 
far away. This problem is the reason that C’s setjmp and longjmp functions 
don’t necessarily restore callee-saves registers, which is why some C compilers 
make pessimistic assumptions when compiling procedures containing setjmp [17, 
19.4]. 

We hide this complexity behind a simple abstraction, the activation. The idea 
of an activation of procedure P is that it approximates the state the machine will 
be in when control returns to P. The approximation is not completely accurate 
because other procedures may change the global store or P’s stack variables 
before control returns to P. At the machine level, the activation corresponds to 
the “abstract memory” of [27, Chapter 3], which gives the contents of memory, 
including P’s activation record (stack frame), and of registers. 

The activation abstraction hides machine-dependent details and raises the 
level of abstraction to the C — source-code level. In particular, the abstraction 
hides: 

— The layout of an activation record, and the encoding used to record that 
layout for the benefit of the front end runtime. 
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— The details of manipulating callee-saves registers (whether to use callee saves 
registers is entirely up to the C — implementation), and 
~ The direction in which the stack grows. 

All of these matters become private to the back end and the C — runtime. 

In the C — run-time interface, an activation record is represented by an ac- 
tivation handle, which is a value of type activation. Arbitrary registers and 
memory addresses are represented by variables, which are referred to by num- 
ber. 

The procedures in the C — run-time interface include: 

void *FindVar( activation *a, int var_index ) asks an activation han- 
dle for the location of any parameter or local variable in the activation 
record to which the handle refers. The variables of a procedure are indexed 
by numbering them in the order in which they are declared in that procedure, 
starting with zero. FindVar returns the address of the location containing 
the value of the specified variable. The front end is thereby able to examine 
or modify the value. FindVar returns NULL if the variable is dead. It is a 
checked runtime error to pass a var_ index that is out of range, 
void FirstActivationC tcb *t, activation *a ). When execution of a 
C — program is suspended, its state is captured by the C — run-time sys- 
tem. FirstActivation uses that state to initialise an activation handle that 
corresponds to the procedure that will execute when the program’s execution 
is resumed. 

int NextActivationC activation *a ) modifies the activation handle a to 
refer to the activation record of a’s caller, or more precisely, to the activa- 
tion to which control will return when a returns. NextActivation returns 
nonzero if there is such an activation record, and zero if there is not. That is, 
NextActivation(fea) returns zero if and only if activation handle a refers 
to the bottom-most record on the C — stack. 

Notice that FindVar always returns a pointer to a memory location, even though 
the specified variable might be held in a register at the moment at which gar- 
bage collection is required. But by the time the garbage collector is walking the 
stack, the C — implementation must have stored all the registers away in mem- 
ory somewhere, and it is up to the C — run-time system to figure out where the 
variable is, and to return the address of the location holding it. 

Names bound by stack declarations are considered variables for purposes 
of FindVar, even though they are immutable. For such names, FindVar returns 
the value that the name has in C — source code, i.e., the address of the stack- 
allocated block of storage. Storing through this address is meaningful; it alters 
the contents of the activation record a. Stack locations are not subject to liveness 
analysis. 

5.3 Ftont-End Information 

Suppose the garbage collector is examining a particular activation record. It can 
use FindVar to locate variable number 1, but how can it know whether that 
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variable is a pointer? The front end compiler cooperates with C — to answer this 
question, as follows: 

~ The front end builds a static initialised data block (Section 3.3), or descrip- 
tor, that says which of the parameters and local variables of a procedure are 
heap pointers. The format of this data block is known only to the front-end 
compiler and run-time system; the C — run-time system does not care. 

— The front end tells C — to associate a particular range of program counters 
with this descriptor, using a span directive. 

— The C — run-time system provides a call, GetDescriptor, that maps an 
activation handle to the descriptor associated with the program counter at 
which the activation is suspended. 

We discuss each of these steps in more detail. As an example, suppose we have 
a function f (x, y), with no other variables, in which x holds a pointer into 
the heap and y holds an integer. The front end can encode the heap-pointer 
information by emitting a data block, or descriptor, associating 1 with x and 
0 with y: 

data { 

gel: bits32 2 
bitsS 1 
bitsS 0 

> 

This encoding does not use the names of the variables; instead, each variable is 
assigned an integer index, based on the textual order in which it appears in the 
definition of f . Therefore x has index 0 and y has index 1. 

Many other encodings are possible. The front end might emit a table that 
uses one bit per variable, instead of one byte. It might emit a list of the indices 
of variables that contain pointers. It might arrange for pointer variables to have 
continuous indices and emit only the first and last such index. ^ The key property 
of our design is that the encoding matters only to the front end and its runtime 
system. C — does not know or care about the encoding. 

To associate the garbage-collection descriptor with f, the front end places 
the definition of f in a C — span: 

span GC gel { 

f( bits32 X, bits32 y ) { 

. . . code for f . . . 

} 

> 

A span may apply to a sequence of function definitions, or to a sequence of 
statements within a function definition. In this case, the span applies to all of f . 

^ This scenario presumes the front end has the privilege of reordering parameters; 
otherwise, it would have to use some other scheme for parameters. 



/* this procedure has two variables */ 
/* X is a pointer */ 

/* y is a non-pointer */ 
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There may be several independent span mappings in use simultaneously, e.g., 
one for garbage collection, one for exceptions, one for debugging, and so on. 
C — uses integer tokens to distinguish these mappings from one another; GC is 
the token in the example above. C — takes no interest in the tokens; it simply 
provides a map from a (token, PC) pair to an address. Token values are usually 
defined using a const declaration (Section 3.6). 

When the garbage collector (say) walks the stack, using an activation handle 
a, it can call the following C — run-time procedure: 

void *GetDescriptor ( activation *a, int token ) returns the address of 
the descriptor associated with the smallest C — span tagged with token and 
containing the program point where the activation a is suspended. 

There are no constraints on the form of the descriptor that gel labels; that 
form is private to the front end and its run-time system. All C — does is transform 
span directives into mappings from program counters to values. 

The front end may emit descriptors and spans to support other services, not 
just garbage collection. For example, to support exception handling or debug- 
ging, the front end may record the scopes of exception handlers or the names 
and types of variables. C — supports multiple spans, but they must not overlap. 
Spans can nest, however; the innermost span bearing a given token takes prece- 
dence. One can achieve the effect of overlapping by binding the same data block 
to multiple spans. 



5.4 Garbage Collection 

This section explains in more detail how the C — run-time interface might be used 
to help implement a garbage collector. Our primary concern is how the collector 
finds, and possibly updates, roots. Other tasks, such as finding pointers in heap 
objects and compacting the heap, can be managed entirely by the front-end 
run-time system (allocator and collector) with no support from the back end. 
C — takes no responsibility for heap pointers passed to code written in other 
languages. It is up to the front end to pin such pointers or to negotiate changing 
them with the foreign code. We defer until Section 6.1 the question of how control 
is transferred from running C — code to the garbage collector. 

To help the collector find roots in global variables, the front end can arrange 
to deposit the addresses of such variables in a special data section. To find roots 
in local variables, the collector must walk the activation stack. For each acti- 
vation handle a, it calls GetDescriptor (&a, GC) to get the garbage-collection 
descriptor deposited by the front end. The descriptor tells it how many variables 
there are and which contain pointers. For each pointer variable, it gets the ad- 
dress of that variable by calling FindVar. If the result is NULL, the variable is 
dead, and need not be traced. Otherwise the collector marks or moves the object 
the variable points to, and it may redirect the variable to point to the object’s 
new location. Note that the collector need not know which variables were stored 
on the stack and which were kept in callee-saves registers; FindVar provides the 
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location of the variable no matter where it is. Figure 2 shows a simple copying 
collector based on [1], targeted to the C — run-time interface and the descriptors 
shown in Section 5.3. 



struct gc_descriptor { 
unsigned var_count; 
char heap_ptr [1] ; 

}; 



void gc(void) { 
activation a; 

FirstActivationCtcb, &a) ; 
for (;;) { 

struct gc_descriptor *d = GetDescriptor (&a, GC) ; 
if (d != NULL) { 
int i ; 

for (i = 0; i < d->var_count ; i++) 
if (d->heap_ptr [i] ) •[ 
typedef void *pointer; 
pointer *rootp = FindVar(a, i) ; 
if (rootp != NULL) *rootp = gc_forward(*rootp) ; 

/* copying forward, as in Appel, if live */ 

} 

} 

if (NextActivation(fea) == NULL) 
break; 

} 

gc_copy() ; /* from-space to to-space, as in Appel */ 



Fig. 2. Part of a simple copying garbage collector 



A more complicated collector might have to do more work to decide which 
variables represent heap pointers. TIL is the most complicated example we know 
of [38]. In TIL, whether a parameter is a pointer may depend on the value of 
another parameter. For example, a C — procedure generated by TIL might look 
like this: 

f( bits32 ty, bits32 a, bits32 b ) { ... } 

The first parameter, ty, is a pointer to a heap-allocated type record. It is not 
statically known, however, whether a is a heap pointer. At run time, the first field 
of the type record that ty points to describes whether a is a pointer. Similarly, 
the second field of the type record describes whether b is a pointer. 

To support garbage collection, we attach to f’s body a span that points to 
a statically allocated descriptor, which encodes precisely the information in the 
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preceding paragraph. How this encoding is done is a private matter between the 
front end and the garbage collector; even this rather complicated situation is 
easily handled with no further support from C — . 



5.5 Implementing the C — Run-Time Interface 

Can spans and the C — run-time interface be implemented efficiently? By sketch- 
ing a possible implementation, we argue that they can. Because the implementa- 
tion is private to the back end and the back-end run-time system, there is wide 
latitude for experimentation. Any technique is acceptable provided it implements 
the semantics above at reasonable cost. We argue below that well-understood 
techniques do just that. 



Implementing spans The span mappings of Section 5.3 take their inspi- 
ration from table mappings for exception handling, and the key procedure, 
GetDescriptor, can be implemented in similar ways [10]. The main challenge 
is to build a mapping from object-code locations (possible values of the pro- 
gram counter) to source-code location ranges (spans). The most common way is 
to use tables sorted by program counter. If suitable linker support is available, 
tables for different tokens can go in different sections, and they will automat- 
ically be concatenated at link time. Otherwise, tables can be chained together 
(or consolidated) by an initialisation procedure called when the program starts. 



Implementing stack walking In our sketch implementation, the call stack is 
a contiguous stack of activation records. An activation handle is a static record 
consisting of a pointer to an activation record on the stack, together with pointers 
to the locations containing the values that the non-volatile registers^ had at the 
moment when control left the activation record (Figure 3). FirstActivation 
initialises the activation handle to point to the topmost activation record on the 
stack and to the private locations in the C — runtime that hold the values of 
registers. Depending on the mechanism used to suspend execution, the runtime 
might have values of all registers or only of non-volatile registers, but this detail 
is hidden behind the run-time interface. [27] discusses retargetable stack walking 
in Chapters 3 and 8. 

The run-time system executes only when execution of C — procedures is sus- 
pended. We assume that C — execution is suspended only at a “safe point.” 
Broadly speaking, a safe point is a point at which the C — run-time system is 
guaranteed to work; we discuss the details in Section 6.2. For each safe point, 
the C — code generator builds a statically-allocated activation-record descriptor 
that gives: 

^ The non-volatile registers are those registers whose values are unchanged after return 
from a procedure call. They include not only the classic callee-saves registers, but 
also registers like the frame pointer, which must be saved and restored but which 
aren’t always thought of as callee-saves registers. 
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The activation handle points to an activation record, which 
may contain values of some local variables. Other local vari- 
ables may be stored in callee-saves registers, in which case 
their values are not saved in the current activation record, 
but in the activation records of one or more called procedures. 

These activation records can’t be determined until run time, 
so the stack walker incrementally builds a map of the loca- 
tions of callee-save registers, by noting the saved locations of 
each procedure. 

Fig. 3. Walking a stack 

— The size of the activation record; NextActivation can use this to move to 
the next activation record. 

— The liveness of each local variable, and the locations of live variables, indexed 
by variable number. The “location” of a live variable might be an offset 
within the activation record, or it might be the name of a callee-saves register. 
GetVar uses this “location” to find the address of the true memory location 
containing the variable’s value, either by computing an address within the 
activation record itself, or by returning the address of the location holding 
the appropriate callee-saves register, as recorded in the activation handle 
(Figure 3). 

— If the safe point is a call site, the locations where the callee is expected to 
put results returned from the call. 

— The locations where the caller’s callee-saves registers may be found. Again, 
these may be locations within the activation record, or they may be this 
activation’s callee-saves registers. NextActivation uses this information to 
update the pointers-to-callee-saves-registers in the activation handle. 
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The C — runtime can map activations to descriptors using the same mech- 
anism it uses to implement the span mappings of Section 5.3. The run-time 
interface can cache these descriptors in the activation handle, so the lookup 
need be done only when NextActivation is called, i.e., when walking the stack. 
An alternative that avoids the lookup is to store a pointer to the descriptor in 
the code space, immediately following the call, and for the call to return to the 
instruction after the pointer. The SPARC C calling convention uses a similar 
trick for functions returning structures [32, Appendix D]. 

The details of descriptors and mapping of activations to descriptors are im- 
portant for performance. At issue is the space overhead of storing descriptors 
and maps, and the time overhead of finding descriptors that correspond to PCs. 
[19] suggests that sharing descriptors between different call sites has a significant 
impact on performance. Because these details are private between the back end 
and the back-end run-time system, we can experiment with different techniques 
without changing the approach, the run-time interface, or the front end. 



6 Refining the Design 

The basic idea of providing a run-time interface that allows the state of a sus- 
pended C — computation to be inspected and modified seems quite flexible and 
robust. But working out the detailed application of this idea to a variety of 
run-time services, and specifying precisely what the semantics of the resulting 
language is, remains challenging. In this section we elaborate some of the de- 
tails that were not covered in the preceding section, and discuss mechanisms 
that support run-time services other than garbage collection. Our design is not 
finalised, so this section is somewhat speculative. 

6.1 Suspension and Introspection 

All our intended high-level run-time services must be able to suspend a C — 
computation, inspect its state, and modify it, before resuming execution. 

In many implementations of high-level languages, the run-time system runs 
on the same physical stack as the program itself. In such implementations, walk- 
ing the stack or unwinding the stack requires a thorough understanding of sys- 
tem calling conventions, especially if an interrupt can cause a transfer of control 
from generated code to the run-time system. We prefer not to expose this im- 
plementation technique through the C — run-time interface, but to take a more 
abstract view. The C — runtime therefore operates as if the generated code and 
the run-time system run on separate stacks, as separate threads: 

— The system thread runs on the system stack supplied by the operating sys- 
tem. The front-end run-time system runs in the system thread, and it can 
easily inspect and modify the state of the C — thread. 

— The C — thread runs on a separate C — stack. When execution of the C — 
thread is suspended, the state of the C — thread is saved in the C — thread- 
control block, or TCB. 
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We have to say how a C — thread is created, and how control is transferred 
between the system thread and a C — thread. 

— The system thread calls InitTCB to create a new thread. In addition to pass- 
ing the program counter for a C — procedure without parameters, the system 
thread must provide space for a stack on which the thread can execute, as 
well as space for a thread-control block. 

— The system thread calls Resume to transfer control to a suspended C — 
thread. 

— Execution of a C — thread continues until that thread calls the C — procedure 
yield, which suspends execution of the C — thread and causes a return from 
the system thread’s Resume call. The C — thread passes a yield code, which 
is returned as the result of Resume. 

For example, garbage collection can be invoked via a call to yield, when the 
allocator runs out of space. Here is how the code might look if allocation takes 
place in a single contiguous area, pointed to by a heap pointer hp, and bounded 
by heap_limit: 

f( bits32 a,b,c ) { 

while (hp+12 > heap_limit) { 

yieldC GC ) ; /* Need to GC */ 

> 

hp = hp+12; 



} 

It may seem unusual, even undesirable, to speak of two “threads” in a completely 
sequential setting. In a more tightly-integrated system it would be more usual 
simply to call the garbage collector. But simply making a foreign call to the 
garbage collector will not work here. How is the garbage collector to find the top 
of the C — portion of the stack that it must traverse? What if live variables (such 
as a, b, c) are stored in C’s callee-saves registers across the call to the garbage 
collector? Such complications affect not only the garbage collector, but any high- 
level run-time service that needs to walk the stack. Our two-thread conceptual 
model abstracts away from these complications by allowing the system thread 
to inspect and modify a tidily frozen C — thread. 

Using “threads” does not imply a high implementation cost. Though we call 
them threads, “coroutines” may be a more accurate term. The system thread 
never runs concurrently with the C — thread, and the two can be implemented 
by a single operating-system thread. 

Another merit of this two-thread view is that it extends smoothly to accom- 
modate multiple C — threads. Indeed, though it is not the focus of this paper, 
we intend that C — should support many very lightweight threads, in the style 
of Concurrent ML [29], Concurrent Haskell [23], and many others. 
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6.2 Safe Points 

When can the system thread safely take control? We say a program-counter 
value within a procedure is a safe point if it is safe to suspend execution of the 
procedure at that point, and to inspect and modify its variables. We require the 
following precondition for execution in the front-end run-time system: 

Ac — thread can be suspended only at a safe point. 

A call to yield must be a safe point, and because any procedure could call 
yield, the code generator must ensure that every call site is a safe point. This 
safe point is associated with the state in which the call has been made and 
the procedure is suspended awaiting the return. C — does not guarantee that 
every instruction is a safe point; recording local-variable liveness and location 
information for every instruction might increase the size of the program by a 
significant fraction [35]. 

So far we have suggested that a C — program can only yield control vol- 
untarily, through a yield call. What happens if an interrupt or fault occurs, 
transferring control to the front-end run-time system, and the currently execut- 
ing C — procedure is not at a safe point? This may happen if a user deliberately 
causes an interrupt, e.g., to request that the stack be unwound or the debugger 
invoked. It may happen if a hardware exception (e.g., divide by zero) is to be con- 
verted to a software exception. It may happen in a concurrent program if timer 
interrupts are used to pre-empt threads. The answer to the question remains a 
topic for research; asynchronous pre-emption is difficult to implement, not only 
in C — but in any system. [11] and [31] discuss some of the problems. One com- 
mon technique is to ensure that every loop is cut by a safe point, and to permit 
an interrupted program to execute until it reaches a safe point. C — therefore 
enables the front end to insert safe points, by inserting the C — statement 

saf epoint ; 

6.3 Call-Site Invariants 

In the presence of garbage collection and debugging, calls have an unusual prop- 
erty: live local variables are potentially modified by any call. For example, a com- 
pacting garbage collector might modify pointers saved across a call. Consider 
this function, in which a+8 is a common subexpression: 

f( bits32 a ) { 



bits32 [a+8] 
g( a ); 


= 10; 


/* 


put 


10 


in 32-bit 


word at 


address 


a+8 


*/ 


bits32 [a+8] 
return: 


= 0; 


/* 


put 


0 


in 32-bit 


word at 


address 


a+8 


+/ 



> 

If g invokes the garbage collector, the collector might modify a during the call 
to g, so the code generator must recompute a+8 after the call — it would be 
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unsafe to save a+8 across the call. The same constraint supports a debugger that 
might change the values of local variables. Calls may also modify C — values that 
are declared to be allocated on the stack. 

A compiler writer might reasonably object to the performance penalty im- 
posed by this constraint; the back end pays for compacting garbage collection 
whether the front end needs it or not. To eliminate this penalty, the front end can 
mark C — parameters and variables as invariant across calls, using the keyword 
invariant, thus: 

f ( invariant bits32 a ) { 
invariant bitsl6 b; 
bits32 c; 

g( a, b, c ); /* "a" and "b" are not modified 

by the call, but "c" might be */ 



} 

The invariant keyword places an obligation on the front-end run-time system, 
not on the caller of f. The keyword constitutes a promise to the C — compiler 
that the value of an invariant variable will not change “unexpectedly” across a 
call. The run-time system and debugger may not change the values of invariant 
variables. 

If variables will not be changed by a debugger, a front end can safely mark 
non-pointer variables as invariant across calls, and front ends using mostly- 
copying collectors [3,4] or non-compacting collectors [9] can safely mark all vari- 
ables as invariant across calls. 

7 Exceptions and Other Services 

In Section 4 we argued that many high-level run-time services share at least 
some requirements in common. In general, they all need to suspend a running 
C — thread, and to inspect and modify its state. The spans of Section 5.3 and the 
run-time interface of Section 5.2 provide this basic service, but each high-level 
run-time service requires extra, special-purpose support. Garbage collection is 
enhanced by the invariant annotation of Section 6.3. Exception handling re- 
quires rich mechanisms for changing the flow of control. Interrupt-based profiling 
requires the ability to inspect (albeit in a very modest way) the state of a thread 
interrupted asynchronously. Debugging requires all of the above, and more be- 
sides. We believe that our design can be extended to deal with these situations, 
and that many of C — ’s capabilities will be used by more than one high-level 
service. Here we give an indicative discussion of just one other service, exception 
handling. 

Making a single back end support a variety of different exception-handling 
mechanisms is significantly harder than supporting a variety of garbage collec- 
tors, in part because exceptions alter the control flow of the program. If raising 
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an exception could change the program counter arbitrarily, chaos would ensue; 
two different program points may hold their live variables in different locations, 
and they may have different ideas about the layout of the activation record and 
the contents of callee-saves registers. They may even have different ideas about 
which variables are alive and which are dead. In other words, unconstrained, 
dynamic changes in locus of control make life hard for the register allocator and 
the optimiser; if the program counter can change arbitrarily, there is no such 
thing as dead code, and a variable live anywhere is live everywhere. 

Typically, handling an exception involves first unwinding the stack to the 
caller of the current procedure, or its caller, etc., and then directing control to 
an exception handler. Many of the mechanisms used for garbage collection are 
also useful for exception handling; for example, stack walking and spans can be 
used to find exactly which handler should catch a particular exception. But the 
mechanisms we have described so far don’t allow for changes in control flow. C — 
controls such changes by requiring annotations on procedure calls. 

The key idea is that, in the presence of exceptions, a call might return to more 
than one location, and every C — program specifies explicitly all the locations to 
which a call could return. In effect, a call has many possible continuations instead 
of just one. When an activation is suspended at a call site, three outcomes are 
possible. 

— The call returns normally, and execution continues at the statement following 
the call. 

— The call raises an exception that is handled in the activation, so the call 
terminates by transferring control to a different location in that activation. 

— The call raises an exception that is not handled in the current activation, 
so the activation is aborted, and the run-time system transfers control to a 
handler in some calling procedure. 

C — ’s call-site annotations specify these outcomes in detail. 

We are currently refining a design that supports suitable annotations, plus 
a variety of mechanisms for transfer of control [26]. Exception dispatch might 
unwind the stack one frame at a time, looking for a handler, or it might use 
an auxiliary data structure to find the handler, then “cut the stack” directly 
to that handler in constant time. Our design also permits exception dispatch to 
be implemented either in the front-end run-time system or in generated code. 
The C — run-time system provides supporting procedures that can unwind the 
stack, change the address to which a call returns, and pass values to exception 
handlers. 



8 Status and Conclusions 

The core design of C — is stable, and an implementation based on ML-RISC 
is freely available from the authors. This implementation supports the features 
described in Section 3, but it does not yet include the span directive or a C — 
run-time system. 
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Many open questions remain. What small set of mechanisms might sup- 
port the entire gamut of high-level language exception semantics? How about 
the range of known implementation techniques? Support for debugging is even 
harder than dealing with exceptions. Exactly what is the meaning of a break- 
point? How should breakpoints interact with optimization? What are the prim- 
itive “hooks” required for concurrency support? How should C — cope with pre- 
emption? 

These questions are not easily answered, but the prize is considerable. Reuse 
of code generators is a critically important problem for language implementors. 
Code generators embedded in C compilers have been widely reused, but the 
nature of C makes it impossible to use the best known implementations of high- 
level run-time services like garbage collection, exception handling, debugging, 
and concurrency — C imposes a ceiling on reuse. 

We hope to break through this ceiling by taking a new approach: design a 
low-level, reusable compiler-target language in tandem with a low-level, reusable 
run-time system. Together, C — and its run-time system should succeed in hiding 
machine-dependent details of calling conventions and stack-frame layout. They 
should eliminate the distinction between variables living in registers and variables 
living on the stack. By doing so, they should 

— Permit sophisticated register allocation, even in the presence of a garbage 
collector or debugger. 

— Make the results of liveness analyses available at run time, e.g., to a garbage 
collector. 

— Support the best known garbage-collection techniques, and possibly enable 
experimentation with new techniques. 

Although the details are beyond the scope of this paper, we have some reason to 
believe C — can also support the best known techniques for exception handling, 
as well as supporting profiling, concurrency, and debugging. 
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Abstract. In proof checkers and theorem provers (e.g. Coq [4] and Pro- 
Pre [13]) recursive definitions of functions are shown to terminate au- 
tomatically. In standard non-formalised termination proofs of recursive 
functions, a decreasing measure is sometimes used. Such a decreasing 
measure is usually difficult to find. 

By observing the proof trees of the proofs of termination of recursive 
functions in ProPre (the system used in Coq’s proofs of termination), [14] 
finds a decreasing measure which could be used to show termination in 
the standard non-formalised way. This is important because it establishes 
a method to find decreasing measures that help in showing termination. 
As the ProPre system made heavy use of structural rather than inductive 
rules, an extended more powerful version has been built with new proof 
trees based on new rules. 

In this article, we show that the ordinal measures found in [14] lose the 
decreasing property in the extended ProPre system and then, set out to 
show that the extended ProPre system will still be suitable for finding 
measures required by other systems (e.g. NQTHM). We do this by show- 
ing that exist other measures that can be associated to the proof trees 
developed in the extended ProPre system that respect the decreasing 
property. We also show that the new parameterised measure functions 
preserve the decreasing property up to a simple condition. 



1 Introduction 

In the verification of programs defined on recursive data structures, that use 
automated deduction, an important property is that of termination. A recur- 
sively defined function terminates if there is a well-founded order such that each 
recursive call of the function decreases with respect to this order. Though the 
termination problem is undecidable, several methods have been proposed for 
studying the termination of functional programs. For example, measures are 
used in the well-known NQTHM system of Boyer-Moore [2,3], and in [6] the sys- 
tem can deal with measures based on polynomial norms. Though efficient, these 
methods need however the measures to be given by the user. Other automated 
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systems [18,15,19] have been developed, these are fully automated but they use 
only fixed ordering or a lexicographic combinations of the ordering. 

Another approach has been developed in the termination procedure of the 
Coq prover [4] implemented in the ProPre system [11]. The method is automated 
and builds formal proofs because it is based on the Curry-Howard isomorphism 
from which lambda-terms are extracted which compute the algorithms. In con- 
trast with other methods as for instance in [3,6], a notion of right terminal state 
property for proof trees is introduced in the procedure instead of measures. It has 
been shown in [14] that once a termination proof is made, it is then possible to 
find a decreasing measure related to each proof tree. The measures characterize 
in some sense the orders found by the ProPre system (called ramified measure) 
which differ from the lexicographic combinations of one single fixed ordering. 
Moreover it has been shown that these measures could be automatically given 
for the NQTHM system. 

However a difficult task for the system of [11] is to be able to establish the 
termination of the automated construction of the proof trees. A drawback of 
that system is that it is not easy to derive efficient rules in a formal context. 
More particularly, the method in [11] is restricted to one general structural rule 
and this implies the right terminal state property of proof trees to be limited. 

To circumvent these drawbacks, the formal logical framework behind the 
method in [11], has been extended to give rise to a new system [12] using other 
rules and accommodated with a generalized induction principle. Furthermore an 
order decision procedure on terms has been introduced outside the proof trees 
that alleviate the search of right terminal state properties. As a consequence, the 
termination method can be used by the system in a far more efficient way and 
the class of formal termination proofs made in the system has been considerably 
enlarged. 

The measures coming from the previous system can be also defined in the new 
system. But unfortunately they do not enjoy the decreasing property anymore. 
Therefore, the method of [14] cannot be used in the new ProPre system [12] to 
find suitable measures required by other systems such as NQTHM. We solve this 
problem in this paper by showing that there exist other measures that can be 
associated to the proof trees developed in the system respecting the decreasing 
property. 

Moreover, the order decision procedure mentioned above, that is external to 
the formal proofs in the ProPre system, is based on the so-called size measure. So, 
this measure function could be easily changed or parameterised in the extended 
ProPre system. We also show that, up to a simple condition (Property 4.11), 
the decreasing property of measures will still hold. 

Our work has the following advantages: 

— We establish a method to find the measures needed to establish termina- 
tion for recursive functions. We extend the system to a more powerful ver- 
sion while retaining the decreasing property of measures. This is important 
because non-formalised termination proofs usually rely on the decreasing 
property. 
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— As the extended version of ProPre used the advantageous order decision 
procedure which was isolated from the formal proofs (in contrast to being 
intertwined with them as in the earlier version of ProPre), this implied that 
the measure functions could be easily parameterised or changed. In this paper 
we show that those measure functions preserve the decreasing property up 
to a simple condition. This means that the measures (now in a larger class) 
found by the method of this paper can be used by systems such as NQTHM. 

2 Preliminaries 

We assume familiarity with basic notions of type theory and term rewriting. The 
following definition contains some basic notions needed throughout the paper. 

Definition 2.1. 

1. Sorts, Functions, Sorted Signature We assume a set S of sorts and a 

finite set T of function symbols (or functions). We use s, si, S 2 , ■ ■ ■ , s', s", 
... to range over sorts and /, /i, / 2 , ■ • ■ , • to range over functions. 

A sorted signature is a finite set T of functions and a set S of sorts. 

2. Types, Arities of functions and Constants For every function / G IF, 
we associate a tyye s\,. . . ,Sn^s with s,si, . . . , s„ G S'. The number n > 0 
denotes the arity of /. A function is called constant if its arity is 0. 

3. Defined and Constructor Symbols We assume that the set of functions 
T is divided in two disjoint sets Tc and Tu- Functions in Tc (which also 
include the constants) are called constructor symbols or constructors and 
those in iFd are called defined symbols or defined functions. 

4. Variables Let A be a countable set of variables disjoint from T . We assume 
that for every variable is associated a sort. 

5. Terms over F and X of sort s: T{F,X)s If s is a sort, F is a subset of 
F and A is a certain set of variables, then the set of terms over F and A 
(simply called terms) of sort s denoted F{F, X)g, is the smallest set where: 

(a) every element of A of sort s is a term of sort s, 

(b) if ti, . . . , tn are terms of sort si, . . . , s„ respectively, and if / is a function 
of type si, . . . , s„ — > s in F, then ffti, . . . , tn) is a term of sort s. 

We use t,l,r,u,v,ti,li,ri,t 2 , ■ . ■ ,t' ,1' ,r' ,t" , . . . to range over T(F, A)g. If 
A is empty, we denote T{F,X)s by T{F)s. T{F,X) = T(F, A)^. 

6. Constructor Terms, Ground terms and Ground Gonstructor Terms 
Recall the set of variables A and the set of functions F = Fc U F^. 

(a) Elements of T (Fc, A)g, i.e., terms such that every function symbol which 
occurs in them is a constructor symbol, are called constructor terms. 

(b) Elements of T(Fc U Fd)s, i-e., terms in which no variable occurs, are 
called ground terms. 

(c) Elements of T {Fc)s i-G-, terms which do not have any variables and where 
every function symbol which occurs in them is a constructor symbol, are 
called ground eonstructor terms. 
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7. (Sorted) Equations A sorted equation is a pair {l,r)s of terms I and r of 
a sort s. We always assume that the equation is sorted and hence, we may 
drop the term sorted and speak only of equations. An equation {l,r)s gives 
rise to a rewrite rule I r. Although a pair {l,r)s is oriented it will also be 
written I =s r. When no confusion occurs, the sort may be discarded from 
the equation and we write {l,r), I — > r and I = r. I (resp. r) are called the 
left (resp. right) hand side of the equation. 

8. Left-Linear Equations An equation is left-linear iff each variable occurs 
only once in the left-hand side of the equation. 

9. Non- Overlapping Equations A set of equations is non overlapping iff no 
left-hand sides unify each other. 

10. Specification or Constructor System A speeification of a function / : 
si, . . . , s„ ^ s in is a non overlapping set of left-linear equations 

{(ei, e'i)s, . . . ,(ep,Cp)s} such that for all 1 < i < p, d is of the form 
fih,... ,tn) with tj e T(J^c,A’)sj, j = 1,... ,n, and e' e ^d,X)s- 

We use £, E' , ... to range over specifications. 

11. {Constructor, Ground, Ground Gonstructor} Substitution A sub- 
stitution cr is a mapping from the set X of variables to the set of terms 
T(lF, A), such that for every variable x, <j{x) and x are of the same sort. 
A substitution a is called a construetor substitution (respectively ground 
substitution, ground construetor substitution) if a{x) is a constructor term 
(respectively ground term, ground constructor term) for any variable x. 

12. Recursive Call Let £ be a specification of a function / with type si, . . . , 
Sn ^ s. A recursive call of / is a pair (/(ti, . . . , tn), /(mi, . • . , m„)) where 
f{ti , . . . , tn) is a left-hand side of an equation of / and /(iti, . . . , Un) is a 
subterm of the corresponding right-hand side. 

3 The Extended ProPre System 

The extended ProPre system deals with inductive types that are defined with 
second order formulas using first and second order universal quantification, im- 
plication and a general least fixed point operator on predicate variables. The last 
connective aims at improving the efficiency of the extracted programs (see [16]). 

Unlike the previous system [11], a connector symbol [ is added whose mean- 
ing is a connective conjunction used with some restrictions but without any 
algorithmic counterpart. The last property is interesting because it first allows 
the programs not to carry out some unnecessary computations, and secondly it 
can easily support inductive methods (which was not the case in the previous 
system). Combined with the connector [, a binary relation symbol ^ is added. It 
corresponds to a well-founded ordering on terms which is used for the inductive 
rule defined in the section. 

Definition 3.1. The language is defined as follows: 

1. Terms The terms of Definition 2.1.6 constitute the first order part. 
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2. Data Symbols For each sort Si is associated a unary second order predicate 
said also data symbol and denoted by Ds^ or Di, whose meaning is: t G 
T{^c)si iff Ds-(t) holds. 

3. Formulae A formula is built as follows: 

(a) if D is a data symbol and f is a term then D{t) is a formula, 

(b) if A is a formula and a: is a variable, then VxA is a formula, 

(c) if A is a formula and u, v are terms, then A f (u ^ u) is a formula, 

(d) if A and B are formulas, then A ^ B is a formula. 

We use A,B,P,F,Fi,F 2 ,. . . to range over formulae. 



Notation 3.2. We will use some convenient conventions: 

1. Du^t is a shorthand for D{u) \ {u ^ t), 

2. VccA — > B denotes Va;(A — > B). 

3. Fi,... ,Fn ^ F denotes Fi ^ (F 2 ^ . . . ^ (F„ ^ F)) . . . ). 

4. Let P = Fi,... ,FkyxD'{x),Fk+i,. . . ,Fjn D{t) be a formula, then 
P-D'{x) denotes the formula Fi, . . . ,Fk, Fk+i, ... , F„ ^ D{t). 

Note that the later notation is correct as it will be used with Definition 3.4. 

Definition 3.3. Let f : si, . . . ,Sn^sG Fd- The termination statement for f 
is the formula: Vxi(F^j(xi) ^ ... ^ ^ Ds(f{xi,... ,x„)))...), 

also written by Notation 3.2 as: VxiFi(xi), . . . , Va;„F„(a:„) ^F(/(a:i, ... , x„)). 

In the previous ProPre system, the proofs relied on two fundamental notions: 
the distributing trees and the right terminal state property. In the extended 
version, the distributing trees now include two new rules, said Struct and Ind 
rules defined in the section. The definition of the right terminal state property 
(Definition 3.8) is now more sophisticated due to the introduction of these rules. 

The ProPre prover makes termination proofs, said I-proofs, with the help of 
some macro-rules (or tactics, or derived rules) of Natural Deduction for Predicate 
Calculus (see [9]). The set of the rules and the definition of I-proofs is described 
in [12]. Due to Proposition 3.9 below, we will only need here to define the Struct- 
rule and the Ind-rule which constitute the distributing trees in ProPre. 

Although the earlier ProPre system can prove the termination of many al- 
gorithms, there are numerous interesting algorithms for whose there exist no 
proof trees. For instance, the example below illustrates that the use Fee-rule 
defined in [11] can lead to loss of efficiency. Let Fr be the sort tree, with the 
leave constant le : Tr and the branch constructor br : Tr,Tr — > Tr. Consider 
the specification of the flatten function flat : Tr ^ Tr given by the following 
equations: 



flat{le) = le 

flat{br{le,a)) = br{le, flat{a)) 
flat{br{br{ai,a 2 ),a)) = flat{br{a\,br{a 2 ,a))). 
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While the specification cannot be proven to terminate in the previous system [11], 
the termination proof is now easily done in the extended system due to the new 
rules presented below. Note that a single ordering using for instance the size mea- 
sure [18] is not sufficient for the termination proof because of the presence of the 
second recursive call. The flatten can be proved to terminate using polynomial 
ordering [10], but these have to be given by the user [1]. Therefore methods have 
been developed in [5,17] that aim at synthesising polynomial orderings. 

We now introduce the rules that are used in the extended system. Let be 
given a sort s. We then consider all the constants ci, . . . , Cp of type s, and all 
the constructor functions Ci : ,Si^. s, {ik >!),*< 9, whose range is 

s. Note that the above distinction between constants and the other constructors 
just corresponds to a question of presentation. Let also F{x) be a formula where 
X, of sort s, is free in F. Then: 

1. <Pci{F) denotes F[a/x\, i < p, 

2. denotes Va:iiAi(a:ii), • yxi^Di^(xi^) ^ F[Ci{xi,^, . . . ,Xi^)/x], 

i < q, where , . . . , Xi^ are not in F, 

3. Fc,{F) denotes ,yxi^D,^{xi^)yz{Dz^aixi^,...,xi^) ^ 

F[z/x]) F[Ci{xi^,. . . ,Xii^)/x]), i < q, where z,xt^,. .. are not in F. 



Definition 3.4. Let P be of the form Fi,... ,FkyxD{x),Fk+i, . . . ,Fm — *■ 
D'ft). The induction rule for the sort s is a choice between the two following 
rules: 



F h <Pci{P-D{x)) i <P, F \- <PCj{P-D{x)) j < q 
F \- P 



Struct{x) 



F b ^Ci{P-D[x)) i <P, P b 'PCj{P-D{x)) j < q 
T b P 



Ind{x) 



For instance the induction rule Ind on integers is: 

r b P_jv(^)(0) r I- \lyN(y),\lz{Nz^sy P-n(x){z)) ^ P-N{x){sy) 

p \- P inayx) 

The Struct has to be considered as a reasoning by cases. The above rules lead 
the following 

Definition 3.5. A formula F is called an Fformula iff F is of the form 
Pi, . . . ,F[jn — > D{f{ti, . . tn)) with D a data symbol and f G Fd such that 
for alH = 1, . . . ,m, Hi is of the form either \/xD'{x) or \/z{D'z^u F'), with 

D' a data symbol, P' an I-formula and u a term. 

Furthermore a formula of the above form Hi = \fz{D' z^u F') is called a 

restrictive hypothesis of F. 

Note that the above definition is a recursive definition whose initial case can 
be obtained with = \/xD'{xy\ The heart C{F) of the formula F will denote 
the term /(ti, . . t„). 
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Though a restrictive hypothesis is not an I-formula, we will also say that H' is 
a restrictive hypothesis of another restrictive hypothesis yz{D' z^s F') if FI' 
is a restrictive hypothesis of the I-formula F' . Finally C{\/z{D' z^s F')) will 

be C{F'). 

Definition 3.6. Let f be a specification of a function / of type si, . . . , s„ — > s. 

is a distributing tree for f iff ^ is a proof tree built only with the Struct rule 
and Ind rule such that: 

1. its root is h Va;iDi(a;i), . . . ,Vx„D„(a:n) — > D{f{x\,... ,Xn)) (termination 
statement). 

2. if £ = {Fi h 01, . . . ,Fq\- 9q} is the set of yl’s leaves, then there exists a one 
to one application b: C ^ £ such that b{L) = {t, u) if and only if L = (F h 0) 
where 0 is an I-formula with C{6) = t. 

One can see that the antecedents remain unchanged in the definition of the rules 
Struct and Ind in the ProPre system. Though this is not so usual, it turns out 
that the antecedent formulas are embedded in the consequents. So, as the context 
(i.e. the set of antecedents) is empty in the root of a distributing tree, there is 
no antecedent in each node of the tree. Therefore we will use the notation 0 both 
for h 0 and for the formula itself. One notes that any formula in a distributing 
tree is an I-formula. 

Before stating the right terminal state property that enjoy the distributing 
trees in the I-proofs developed in the ProPre system, we assume that there is 
a well founded ordering C on term corresponding to the interpretation of the 
relation symbol ^ defined in the language. This ordering is made explicit in the 
next section. We also need the 

Definition 3.7. We say that an I-formula or restrictive hypothesis F can be 
applied to a term t if C{P) matches t according to a substitution a such that 
for each variable x occurring free in P we have cr(x) = x. 

Definition 3.8. Let f be a specification of a function / and Ahe & distributing 
tree for £. We say that A satisfies the right terminal state property (r.t.s.p.) iff 
for all leaves L = 0 of A with e G £ the equation such that b{L) = e {b given 
in Definition 3.6) and for all recursive calls (t,f) of e, there exists a restrictive 
hypothesis P = \/zDz^s, Hi, . . . ,Hk D{w) of 0 and a such that P can be 
applied to v according to a substitution a with: 

1. cr(z) C s and 

2. for all restrictive hypothesis H of P of the form VyD'y^s' K there is a re- 
strictive hypothesis Hq of 0 of the form ffyD'y^sg K such that cr(s') F sq- 

This characterization is due to the following proposition (see [12] for proof). 

Proposition 3.9. There exists an I-proof for / iff there exists a distributing 
tree for / with the right terminal state. 

Proposition 3.9 says that one can only focus on distributing trees that satisfy the 
right terminal state. So, as already mentioned, we do not explicit I-proofs here 
but we only consider distributing trees and the right terminal state properties. 
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4 Synthesising Ordinal Measnres 

The earlier system built proof trees which have the right terminal state property 
defined in [13]. It has been shown in [14] that one can extract an ordinal measure, 
which will be called R-measure, from each proof tree. The R-measure has the 
decreasing property if the proof tree satisfies the right terminal state property. 
This measure can be also defined against a proof tree with the new context. But 
the decreasing property of the R-measure is not valid anymore. A reason is that, 
as the system ProPre corresponds to an extension of the Recursive Def inition 
of the Coq system, the existence of suitable measures does not correspond any 
longer to the R-measures. It turns out that if we want to retrieve the decreasing 
property, we need to extend the class of measures to other measures. 

In this section we recall the definition of the R-measures but in the context 
of the extended system, and we present the theorem on the decreasing property 
of the measures that fails but which will be re-established. We then introduce 
the extended measures for which Theorem 1 holds again. 

4.1 The R-measures 

Before giving the ordinal measures we first introduce some definitions concerning 
the judgments in distributing trees. 

Definition 4.1. Let A be a distributing tree. A branch B from the root 6i to 
a leaf 6^ will be denoted by (0i,xi),... ,{9k-i,Xk-i),0k where Xi {1 < i < k), 
is the variable for which either the rule Struct or Ind is applied on 9i. 

Definition 4.2. Let A be a tree and 9 a node of A. The height of 9 in A, 
denoted by H{9,A), is the height of the subtree of A whose root is 9 minus one. 

According to the definition of a distributing tree A, we have the two following 
straightforward facts. 

Fact 4.3. Let £ be a specification of a function / of type si, . . . , ^ s and 

A be a distributing tree. For each {ti , ... ,tn) € T{Pc)si * ... * T(J^c)s„ there 
exists one and only one leaf 9 oi A and a ground constructor substitution p such 
that p{C{9)) = f{ti, . . . An)- 

Fact 4.4. For every branch of A from the root to a leaf (0i, xi), . . . , {9k-i,Xk-i), 
9k and for all i < j < k, there exists a constructor substitution Uj i such that 
a,ACm = C{9,). 

Definition 4.5. The recursive length of a term t of sort s is defined by: 

1. if f is a constant c, then lg{c) = 0, 

2. lit = C{ti , . . . , tn) with C : si, . . . , Sn ^ s G IFc then lg(t) = 1-1- E lg{tj). 
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Definition 4.6. Let f be a specification of a function f : si, . . . , Sn ^ s such 
that there exists a distributing tree A for £. The R-measure 
S^R : T{Ac)si * ■■■ * T^{^c)s„ — *■ where w is the least infinite ordinal, is 

defined as follows: 

Let t = {ti, . . . , be an element of the domain and 9 be the leaf of A such that 
there is a substitution p with p{C{9)) = f{t) (Fact 4.3). Let B be the branch 
(01, xi), . . . , {9k-i,Xk-i),9 of A from the root to 0, let ar,s be the substitutions 
of Fact 4.4. Then QR{t) is defined as the following ordinal sum: 

fc-i 

£2n{t) = Y,u:^^^-^Ulg{p{akA^i))) , 

i=l 

We now need some definitions before giving Theorem 1. 

Definition 4.7. A finite sequence of positive integers q will be called a position, 
e will denote the empty sequence and • the concatenation operation on sequences. 

For each position q and sort s, we will assume there is a new variable of sort s 
indexed by q distinct from those of X. The following definition allows us to state 
Theorem 1 below. 

Definition 4.8. Let be a term t and g be a position, the term |t]g is defined as 
follows: |c]q = c if c is a constant, |a:]q = a: if cc is a variable, |C'(ti, . . . , tn)}q = 

C'(|tl]g.l, . . . , Itnjq-n) if C S Ac, and |/(ti, . . . ,tn)jq = Xq if f G Fd- 



Theorem 1. Let £ be a speeifieation of a funetion f : si, . . . , Sn —>■ s and A be a 
distributing tree A for £ having the right terminal state property. The associated 
measure Hr then satisfies the decreasing property. That is to say, for each re- 
cursive call (/(ti,... ,tn), f{u\, . . . ,Un)) of £ and for every ground constructor 
substitution ip we have: flR^tpfti), ... , (fftn)) > f2_R((/3(|ui]i), . . . , (/3(|u„]„)) 

Unfortunately, though Theorem 1 holds in the context of R-proofs (see [14]), 
examples show that it fails in the current context. Consider, for instance, the 
simple example of the specification of the addition function add : nat, nat — > nat, 
defined with an unusual way illustrating our purpose. 

add{s{x), s{y)) = add{s{s{x)),y) 
add{0,y) = y 
add{s{x) ,{f) = s{x) 

There exists a tree which enjoys the right terminal state property that leads to 
the following measure: Qr{u,v) = oj * lg(u) + lg(v). Obviously the decreasing 
property does not hold. 

In the remaining of the section, we introduce new measures that enable the 
theorem to be restored. 
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4.2 The New Ramified Measures 

As already mentioned, an ordering relation IZ on term is introduced in the ex- 
tended system. In contrast to the previous system, this relation can be checked 
outside of the formal proofs and so can be easily modified independently of the 
logical framework of the system. The ordering relation is related to a measure 
on terms in the following way. 

Definition 4.9. Assume a measure m on the terms ranging over natural num- 
bers. Let u,v €T {Tc-, T)s for a given sort s. We say that w Z iff: 

1) m{u) < m{v), 2) Var(u) C Var(v), 3) u is linear 

A special measure, the so called size measure Igi, is used in the system and is 
defined as follows: 

Definition 4.10. The size measure of a term t of sort s is given by: 

1. if t is a constant or a variable, then lgi(t) = 1, 

2. if t = C{ti , . . . , tn) with C : si, . . . , Sn ^ s € Tc then lgi{t) = 1-1- lgi{ti) -I- 
... -I- lgi{tn) 

Note that Definition 4.13 uses only the value on constructor ground terms for 
the measure m, but this one is also defined on constructor terms because it is 
needed for the termination proofs of the ProPre system. 

In order to be able to prove the decreasing property of the new ordinal measures 
defined below, we will only need to assume a property on the measure m. 

Property 4.11. Let u,v G T{F,X)s such that u Z u. Then for all constructor 
substitutions a, we have m{a{u)) < m{a{v)). 

Note that the lemma obviously holds for Igi. For that, it is enough to remark 
t\\a.tlgi{t) — l>Qa.ndlgi{a{t)) = lgi{t) + ^{x,t)* ^ (/ 5 z(cr(a;)) — 1) for any 

x£Var{t) 

term t, where Var{t) denotes the set of variables which occur in t and t) is 
the number of the occurrences of the variable x in t. 

It is now necessary to distinguish the sequents coming respectively from an appli- 
cation of the Struct-Tule and the Ind-rule. Therefore we introduce the following: 

Definition 4.12. Let 0 be a judgment in a distributing tree A and 9' an 
immediate children of 9. We say that 9 is decreasing and 9' is an Ind-judgment 
if one comes from the other using the Ind rule. The test function ^ is defined on 
each node as follows: ^{9) is 1 if 0 is a decreasing judgment and 0 if not. 

Definition 4.13. Let f be a specification of a function / : si,... ,s„ ^ s 
such that there exists a distributing tree A for £. The new ramified measure 
fli : T{Tc)sx * ... * T{Tc)s„ — *■ is defined as follows: 
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Let t = {ti, . . . , be an element of the domain and 9 be the leaf of A such that 
there is a substitution p with p{C{9)) = f{t) (Fact 4.3). Let B be the branch 
( 6 * 1 , a:i), . . . , (6*/c_i, Xfc_i), 6* of A from the root to 9, let ar,s be the substitutions 
of Fact 4.4. Then 



fc-i 

^lit) = X! * f( 6 *i) * m{p{ak,i{xi))) . 

i=l 

The intuition would suggest to substitute only the measure m instead of the 
recursive Ig in Definition 4.6. But once again, examples show that Theorem 1 
fails in that case. It is now far from obvious that the new ordinal measures enjoy 
the decreasing property. However Theorem 1 now holds with the new measures, 
whose version is given below with Theorem 2 

Theorem 2. Let £ be a speeifieation of a funetion / : si, . . . ^ Sn —>■ s and Abe a 
distributing tree A for £ having the right terminal state property. The associated 
measure f2j then satisfies the decreasing property. That is to say, for each re- 
cursive call (/(ti,... ,tn), f{u\, . . . ,Un)) of £ and for every ground constructor 
substitution p we have: f2j{p{ti ), . . . , ip{t„)) > i7/((/3(|ui]i), . . . , </?(|u„]„)) 

Proof: The proof is long but it can be derived from the main Proposition 5.25 
below. The reader is referred to [ 8 ] for a detailed proof of Theorem 2. □ 

Now that we have Theorem 2, we can extract from an automated termination 
proof of the flatten function defined at Section 3 the following ordinal measure 
which has the decreasing property: 

£2i{le) = to f2i{br{le, a)) = w * (1 + lgi{a)) 

ili{br{br{a, b), c) = w * (2 + lgi{a) + lgi{b) + lgi{c)) + 1 + lgi{a) + lgi{b). 

5 The Analysis of the I-formulas 

This section is devoted to the analysis of the I-formulas. Due to the shape of 
the distributing trees and the I-formula that appear in the branches, we need to 
introduce some definitions and to establish several lemmas which will is used for 
the proof of Theorem 2 and Proposition 5.25 . 

Definition 5.1. For a term t and a subterm m of t that has only one occurrence 
in t, u>t will denote the position of u in t. 



Definition 5.2. TZTi{F) denotes the set of the restrictive hypotheses of an 
I-formula F and for P = \/z{Dz^s F') with F' an I-formula, we define 
TZH{P) = TZH{F'). For Pi and Pj in TZH{F) we say that Pi is before Pj if F can 
be written P\, . . . ,Pk F>{t) with 1 < j < f < fc. Moreover, for a restrictive 
hypothesis P of F, then #(P, F") = 1 -I- card{P' € TZ{F),P' before P}. 
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One can easily see that, if O' is an immediate antecedent of 0 in a distributing 
tree, then each restrictive hypothesis of 9 corresponds to a restrictive hypothesis 
in O' . A new restrictive hypothesis is also in O' if the rule is Ind. Formally we 
have the following definition. 

Definition 5.3. Let 9 be a judgment in a distributing tree and O' an immediate 
antecedent of 0. We define an injective application Tlese',e ■ Ti-(0) ^ TZ{9') with 
TZes 0 '^g{P) the restrictive hypothesis P' in TZ{0') such that ^{P',0') = #{P,0). 

TZesgi fi{P) can be seen as the residual of P in O' and therefore the application 
can be generalized to any antecedent O' of 0 using composition of applications. 

Definition 5.4. For an Ind-judgment O' in a distributing tree, the restrictive 
hypothesis P in 9 such that #(P, O') = card{TZ{0')) is called the new hypothesis, 
denoted by Af{0'). In particular, it is such that all restrictive hypotheses in O' 
are before P. 

Remark 5.5. We can remark that if 0 is a decreasing judgment with x the 
induction variable and O' an immediate antecedent then xt>C{9) = z\>C{Af{ 6 )) 
where the new hypothesis Af{0) is of the form \/z{Dz^s H). This will be used 
for the proof of Proposition 5.25. 

If O' is an immediate antecedent of a decreasing judgment 0, we know that O' 
is of the form: Va;iDi(a:i), . . . ,'ixkDk{xk),M{0') 0-d(x)[w/x], with A/’(0') = 

z{Dz^,, ^ 9 -D{x)[z / x\). So, for a Ind-judgment O', we can easily define the 
application Vgi : TZ{Af{0')) ^ TZ{0') where T>g'{Q) is the restrictive hypothesis 
Q' of 0_D{x)[w/x] with ^{Q',9-Dix)[w/x]) = #{Q,9 -d{x)[z/x]). We can say 
that I? is a duplication of restrictive hypotheses. 

Lemma 5.6. Let P = \/z{Dz^s, Hi, . . . , Plk — *■ D{t)) be a restrictive hypothesis 
0 of a judgment in a distributing tree then 

1) the variables of s are free in P and have no other occurrences in P, 

2) the variables in P distinct of those in s are bounded in P. 

3) s is a subterm of C{0) and st>C{0) = z\> C{P). 

Proof: See [12]. □ 

Definition 5.7. Let G and F be two restrictive hypotheses. We define a con- 
gruence relation as follows: F and G are said similar, denoted by F « G if they 
are respectively of the form 'iz{D{z)^g H) and Wz{D{z)^t H). 

Lemma 5.8. Given an Ind-judgment 0 in a distributing tree and P a restrictive 
hypothesis of N{0). Then T>b{P) « P- 

Proof: According to the form of Af{0) (see the definition of Vg'), we know that 
P and Vg{P) are of the form \/y{D'{y)^s' H')[z/x] and Vt/(D'(y)^s/ ^ 

H')[w/x]. Lemma 5.6 says that x does not occur in H (and may not possibly oc- 
cur in s'). Therefore P = yy(D' H') and Vg{P) = yy(D'{y)^s'[w/x] 
^ iJ'), thus T>e(F). □ 
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Lemma 5.9. Let P be a restrictive hypothesis of 0 in a distributing tree, and 
9' an antecedent of 9. Then TZese',e{P) ~ P- 

Proof: By induction on the branch between 6 and 9' . □ 

Corollary 5.10. If 0 is a judgment in a distributing tree, 9' an immediate 
antecedent of 9, and P a restrictive hypothesis of 0, then TZ{TZeso' ,e{P)) = P{P)- 

Proof: By Lemma 5.9, we have P = Vz(P(z)^si ^ F) and TZesgi^g{P) = 
yz{D{z)^s:, ^ F). Thus n{P) = F = TZ(TZesg>^e{P)). ’ □ 

Lemma 5.11. For all judgments 0 in a distributing tree, then there does not 
exist two restrictive hypotheses similar in 0. 

Proof: See [8] □ 

Definition 5.12. Let 0 be a judgment in a distributing tree and 0i, . . . , 
0„ = 0 the consecutive judgments from the root 0i to 0. Let P be a restrictive 
hypothesis of 0. We note J'(P) the first integer j such that there is Q G 'F{9j) 
with P = TZesg^g.{Q), which is correct since TZg^g{P) = P. 

Since every application TZesg/^g is injective, TZesg,^g{P) will denote the antecedent 
of P with the assumption that P is in the image of the application. 

Lemma 5.13. In the context of the previous definition, the rule between 0y(p) 
and 0y(p)_i is the Ind-rule, and TZesgl^^^^{P) = Af{9j(^p^). 

Proof: The opposite leads to a contradiction with the definition of J{P). □ 

Corollary 5.14. Let P be a restrictive hypothesis of a judgment 0 in a dis- 
tributing tree. Then, using also Corollary 5.10, we have 

P(P) = P(Pes-^^^^(P)) = P(Af(0y(P))). 



Definition 5.15. Let 0 be a judgment in a distributing tree and P be a re- 
strictive hypothesis of 0. Then we can now etasblish the following diagram and 
thereby define the application Tp^ : P(P) ^ P(0), with Tp^g = TZesg^g^^p^ o 

Fp_g 

n{p) ^ n{9) 



Id 



I 



Fesg^g^^p^ 



n{ne 






{P)) 
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In the case where 9 is an Ind-judgment and P = M{9), then 0j(p) = 6 and 
"fp,e = Ps- So, T can be seen as a generalization of T> for all restrictive hypotheses 
of any 9. 

Fact 5.16. We remark that Tp^g is injective by composition of injective ap- 
plications. Moreover, according to Lemmas 5.8 and 5.9, Tp^g{Q) w Q for all 
QGn{P). 



Lemma 5.17. For a restrictive hypothesis P of a judgment 0 in a distributing 
tree and Q a restrictive hypothesis of P, we have f7(P) > J{Tp,g{Q))- 

Proof: See [8] □ 

Lemma 5.18. Let M be a distributing tree for a specification of a function, 
having the right terminal state property. Let 0 be a leaf of A and (t, v) be a 
recursive call of C{9). In this context, if P is the restrictive hypothesis of 9 
holding Definition 3.8 of the r.t.s.p of A and P[ and Hq holding the point 2) of 
Definition 3.8 with the same notations, then Tp^g{P[) = Hq and J7(P) > 

Proof: According to the point 2) of Definition 3.8, we have H « Hq. Furthermore, 
by Fact 5.16, Tp^g{H) « H. Hence Lemma 5.11 gives us that Tp^g{Q) = Hq and 
then J{P) > J{Hq) with Lemma 5.17. □ 

Definition 5.19. For any 0 in a distributing tree and an antecedent 9' of 9, 
then [0, 9’]d (respectively [9, 9'[i) will denote the set of the decreasing judgments 
(respectively Ind-judgments) between 9 and 9' (respectively without 9'). 



Fact 5.20. Let £ be a specification of a function / and A be a distributing tree 
for f . If 01 is the root of A, that is to say the termination statement of /, and 
if 9 is an Ind-judgment in A, then card{TZ{Af (9))) = card{[9i,9[p). 

Proof: Since card{TZ{9)) = card{TZ{N{9))) + l, it is actually enough to show that 
card{TZ{9)) = card{[9i,9]p) which is then straightforward by induction on the 
number of judgments 9\, ... ,9. □ 

Fact 5.21. Let P and P' be two distinct restrictive hypotheses of a judgment 
9, then J{P) ^ J{P'). 

Proof: The opposite leads to a contradiction thanks to Lemma 5.13. □ 

Lemma 5.22. Let A be a distributing tree having the r.t.s.p. with the root 9\. 
Let P be the restrictive hypothesis of a leaf 9k in the definition of the r.s.t.p., 
then for all 9 G [9i, 9j(p)[/, there is one and only one H G P(P) such that 

^ = ^J(rp,epH))- 

Proof: By Lemma 5.13, for all H G 77.(P), (^j(rp epH)) is an Ind-judgment. 
Furthermore Lemma 5.18 says that J{P) > J{Tp^g^{H)) and so 9j{Yp e,^{H)) 
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€ Let = y {Oj(jp,e^(H))} included in [ 6 »i, 6 »j(p) [ 7 . As Tp^g^ 

Hen{P) 

is injective, then, using Fact 5.21, we get card{U) = card{TZ{P)). Now 
Card{[9i,0j(^P^[i) = card{TZ{Af {9 j (^p)))) (Fact 5.20) 

= card(7^(7^eSg]g^^p^(P))) (Lemma 5.13) 

= card{TZ{P)) (Corollary 5.14) 

Hence C/ = [ 6 »i, 0 j-(p)[ 7 . □ 

Lemma 5.23. Let 9 and 9' be two judgments in a distributing tree of a speci- 
fication then C{9) and C{9') match the same term iff 9 and 9' are in the same 
branch. 

Proof: Fact 4.4 gives one sense, the other one is made assuming the opposite and 
using the fact that if a judgment does not match a term, then its antecedent do 
not neither. □ 

Lemma 5.24. Let 0 be a judgment in a distributing tree of a specification and 
9' an antecedent of 0. If P is a restrictive hypothesis of 9' such that 9j(^p)_i G 
[9,9'[p then C{9) matches C{P). 

Proof: by the previous lemma C{9) matches C{9j(^p^_i). Furthermore, let Q' 
denotes TZesg ' then Q fn P with Lemma 5.9 and so C{Q) = C{P). 
Now, since Q is the new hypothesis of 9j(^p'^, it is easy to see that C{9j(^p')_i) 
matches C{Q). Hence C{9) matches C{P). □ 

We now state the main Proposition below that enables Theorem 2 to hold. 

Proposition 5.25. Let A be a distributing tree of a specification £ with 
the right terminal state property and (t,u) be an inductive call of £. Let also 
B = ( 6 * 1 , xi), . . . , (0fc_i, Xfc_i), be a branch of A with C{9k) = t. Let P be a 
restrictive hypothesis of 9k and cr“ be the substitution such that a^{C{P)) = u 
with respect the r.t.s.p.. Then for each decreasing judgment 9i in B which is a 
strict descendent of 0j(p)_i (i.e. i < J{P) — 1), C{9i) (respectively C{9j(^p)_i)) 
matches u according to a substitution cr" (respectively (Tj^pj_^) and 

m{ip o af{xi)) < m{ip o ak,i{xi)), 
m{if o CT^(p)_;^(xj(p)_i)) < m{(p o tTfc_j(p)_i(xj(p)_i)) 

for all ground constructor substitution ip (where akj are given in Fact 4.4). 

Proof: Let 9i be a decreasing judgment with i < J{P) — 1. By Fact 4.4, we 
know that C{9i) matches C{9j(^p)-i) which matches also C(P) according to 
Lemma 5.24 (with 9 = 9j(^p-^_i,9' = 9k), and so C{9i) matches u. 

Now, we are going to show the first inequality. Since 0^+1 is an Ind-judgment, 
by Lemma 5.22, there is a restrictive Q of P such that Jypb^(Q)= * + 1- Let 
N{9ipi) = yz{Dz^s — > G) be the new hypothesis of 0i+i and let Qo be Tp^g^(Q). 
We know that Q « Qo, likewise Qo « '^^^SkSj(Q Hence 

Q « Af{9ipi) and we can write Q = Mz(Dz^s' G) and Qo = '^z{Dz^sq G). 
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Now 

Xi > C{6i) = z> C(Af(0i+i)) (Remark 5.5) 

= z>C(Q) (Q«AA(0,+i)) 

= s' > C'(P) (Lemma 5.6). 

Moreover C(0i) matches C(P) which matches u. Then, with the previous equal- 
ities, we have cr“(xi) = ct“(s'). 

Furthermore: 

z > C(Q) = z > C{Tp,e,m {Q « Tp, 0 , (Q)) 

= so>C(0fc) (Lemma 5.6) 

= so>t \c{ 0 k)=t). 

With the inequalities we have Xi > C{9i) = sq > L Hence, since C{0i) matches 
C{8k) = t, we get crk,iixi) = sq. 

Finally, point 2) of the right terminal state property says that cr“(s') C so, and 
so, by Property 4.11, m{if o a^{s')) < m{(p{so)). That is to say m{{p o af'{xi)) < 
m{ip o ak,i(xi)) 

It remains to show the second inequality. We recall that Oji^p^ is an Ind- 
judgment whose new hypothesis is M{9j(^p^) = ~ Then 

xj{p)-i > C'(6»j'(p)_i) = z > C{N{9j(^p))) (Remark 5.5) 

= z>C(P) {C{N{9j^p)) = C{P)) 

= s > C{9k) (Lemma 5.6) 

= sc>t lc{9k) = t). 

Thus ak^j(p)-i{xj(^p)-i) = s. 

Furthermore, we have seen for the first inequality that C{9 j(p'j_i) matches P 
which matches u, then by a previous equality, cr^(p)_ 2 (a:j'(p)_i) = a'^{z). Now 
using point 1) of the right terminal state property, we have a'^(z) C s which gives, 
by Property 4.11, m((/3ocr"(z)) < m{(p{s)). Therefore m{(poa j^p~^_-^{x j(^p)-i)) < 

m{(poak,j(p)-i{xj^p)_i)). □ 

6 Conclusion 

While the measures found from the termination proofs of the recursive definition 
command of Coq were shown in [14] to be suitable for other systems such as the 
NQTHM of [2,3], they cannot be defined in the extended termination system 
without losing the decreasing property. We have solved the problem by showing 
the existence of other decreasing measures in the extended termination system 
in question (the new ProPre of [12]). Moreover, the new measures we found in 
this paper, enlarge the class of suitable measures in the sense that each recursive 
algorithm proven to terminate in the previous system ProPre [11] is also proven 
to terminate in the extended ProPre system [12]. 

The orders characterised by the measures differ from the lexicographic combi- 
nations of one fixed ordering [18,15,19]. We can also mention the work of [7] 
which supports the use of term orderings coming from the rewriting systems 
area especially those methods of [5,17] which aim at automatically synthesising 
suitable polynomial orderings for termination of functional programs. 
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There is now no more obstacle to provide the measures to other systems that 
require such measures. The investigations of formal proofs in this paper highlight 
new measures and advocate as in [14] a termination method based on ordinal 
measures. 
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Abstract. To simplify the task of proving termination of term rewriting 
systems, several elimination methods, such as the dummy elimination, 
the distribution elimination, the general dummy elimination and the im- 
proved general dummy elimination, have been proposed. In this paper, 
we show that the argument filtering method combining with the depen- 
dency pair technique is essential in all the above elimination methods. 
We present remarkable simple proofs for the soundness of these elimi- 
nation methods based on this observation. Moreover, we propose a new 
elimination method, called the argument filtering transformation, which 
is not only more powerful than all the other elimination methods bnt 
also especially useful to make clear the essential relation hidden behind 
these methods. 

Keywords: Term Rewriting System, Termination, Elimination Method, 
Dependency Pair, Argument Filtering 



1 Introduction 

Term Rewriting Systems (TRSs) can be regarded as a model for computation 
in which terms are reduced, using a set of directed equations. They are used 
to represent abstract interpreters of functional programming languages and to 
model formal manipulating systems used in various applications, such as program 
optimization, program verification and automatic theorem proving [5,9]. 

Termination is one of the most fundamental properties of term rewriting sys- 
tems. While in general termination of TRSs is an undecidable property, several 
methods for proving termination have been developed. To simplify the task of 
proving termination of TRSs to which these methods cannot be directly ap- 
plied, several elimination methods have been proposed. Elimination methods 
try to transform a given TRS into a TRS whose termination is easier to prove 
than the original one. The dummy elimination [6], the distribution elimination 
[13,16], the general dummy elimination [7] and the improved general dummy 
elimination [15] are examples of elimination methods. 

Recently, Arts and Giesl proposed the notion of the dependency pair, which 
can offer an effective method for analyzing an infinite reduction sequence [1,2,3]. 
Using dependency pairs, we can easily show the termination property of TRSs 
to which traditional techniques cannot be applied. Since this method compares 



G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 47-61, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 



48 



Keiichirou Kusakari et al. 



rewrite rules and dependency pairs by a weak reduction pair instead of a re- 
duction order, to find an appropriate weak reduction pair for a given TRS is 
necessary. The argument filtering method introduced [4,8] allows us to make a 
weak reduction pair from an arbitrary reduction order. 

In this paper, we first extend the argument filtering method by combin- 
ing subterm relation. Next, we study the relation between the argument filter- 
ing method and various elimination methods. The key of our result is the ob- 
servation that the argument filtering method combining with the dependency 
pair technique is essential in all the above elimination methods. Indeed, we 
present remarkable simple proofs for the soundness of these elimination methods 
based on this observation, though the original proofs presented in the literatures 
[6,7,13,15,16] are complicated and treated as rather different methods respec- 
tively. This observation also leads us to a new powerful elimination method, 
called the argument filtering transformation, which is not only more powerful 
than all the other elimination methods but also especially useful to make clear 
the essential relation hidden behind these methods. The main contributions of 
this paper are as follows: 

(1) We show that the argument filtering method combining with the depen- 
dency pair technique can clearly explain in a uniform framework why various 
elimination methods work well. 

(2) A new powerful elimination method, called the argument filtering trans- 
formation, is proposed. Since the transformation is carefully designed by 
removing all unnecessary rewrite rules generated by other elimination meth- 
ods, it is the most powerful among these elimination methods. 

(3) We make the relation clear among various elimination methods through 
comparing them with corresponding restricted argument filtering transfor- 
mation. For example, the dummy elimination method can be seen as a re- 
stricted argument filtering transformation in which each argument filtering 
always removes all arguments, and the distribution elimination method re- 
stricts each argument filtering into collapsing one. 

The remainder of this paper is organized as follows. The next section gives 
the definition of term rewriting systems. In section 3, we explain the dependency 
pair technique and introduce a new argument filtering method. Using these re- 
sults, we show a general and essential property for elimination methods to be 
sound with respect to termination. In section 4, we propose the argument filter- 
ing transformation and show the soundness of this transformation. In section 5, 
we compare various elimination methods with the argument filtering transfor- 
mation, and give simple proofs for the soundness of these elimination methods. 

2 Preliminaries 

We assume that the reader is familiar with notions of term rewriting systems [5]. 

A signature A is a finite set of function symbols, where each / S A is 
associated with a non-negative integer n, written by arity(f). A set V is an 
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enumerable set of variables with 17 n V = 0. The set of terms constructed from 
S and V is written by T{S,V). V ar{t) is the set of variables in t. We define 
root{f{ti , . . . , tn)) = /■ Identity of terms is denoted by =. A substitution 0 : V — > 
T(17, V) is a mapping. A substitution over terms is defined as a homomorphic 
extension. We write t6 instead of 9{t). A context is a term which has a special 
constant □, called a hole. A term C[t] denotes the result of replacing t in the 
hole of C. 

A rewrite rule is a pair of terms, written hy I r, with I ^ V and V ar(l) D 
Var(r). A term rewriting system (TRS) is a set of rules. The set of defined 
symbols in a TRS i? is denoted by DF{R) = {root{l) | Z — > r S i?}. A reduction 
relation — > is defined as follows: s^t 3 / ^ r G i?, 3C\ ] , 30 (s = C\19] A 

R R 

t = C[r9]). We often omit the subscript R whenever no confusion arises. A TRS 
R is terminating if there is no infinite sequence such that tp ^ ti — > ^2 ^ The 

R R R 

transitive-reflexive closure and the transitive closure of — *■ are denoted by ^ and 
respectively. 

A strict order > is a reduction order if > is well-founded, monotonic (s > t 
C'[s] > C[t]) and stable {s > t ^ s9 > t9). Note that a TRS R is terminating 
iff there exists a reduction order > that satisfies I > r for all / — > r € R. A 
reduction order > is a simplification order if C[t] > t for all t and C (^ □). A 
TRS R is simply terminating if there exists a simplification order > that satisfies 
I > r for alH ^ r G R. 

3 Soundness Condition for Transformation 

In this section, we first explain the dependency pair and the argument filtering 
method, whose notions greatly extend the provable class of termination. Using 
these notions, we show a theorem, which makes a general and essential property 
clear for transformations of TRSs to be sound with respect to termination. 

Definition 31 . [1,2,3] E* = {f* \ f G E} is the set of marked symbols disjoint 
from EUV. We define the root-marked terms by (/(ti, . . . ,tn))* = f*{t \, . . . ,tn). 
The set of the dependency pairs of R, written by DPH{R), is {{u*,v*) \ u 
C[v] G R, root{v) G DF{R)}. The set of the unmarked dependency pairs of R, 
written by DP{R), is obtained by erasing marks of symbols in DPll'{R). 

Example 32 . Let R = {add{x,0) x, add{x,s{y)) — > s{add{x,y))} . Then, 
DP*{R) = {{add*{x,s{y)),add*{x,y))}. and 
DP{R) = {{add{x, s{y)),add{x,y))} 

Definition 33 . A pair ( ^ , >) o/ binary relations on terms is a weak reduction 
pair if it satisfies the following three conditions: 

— ^ is monotonic and stable 

— > is stable and well-founded 
->•> C >or> >C > 
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In the above definition, we do not assume that ^ is a quasi-order (refiexive 
and transitive) or > is a strict order (irrefiexive and transitive). This simplifies 
the design of a weak reduction pair. We should mention that this simplification 
does not lose the generality of our definition, because for a given weak reduction 
pair ( , >) we can make the weak reduction pair ( ^ *,>’’’) in which ^ * is 

a quasi-order and >+ is a strict order. Note that > is a reduction order if and 
only if (>, >) is a weak reduction pair. 

Theorem 34. For any TRS R, the following three properties are equivalent. 

1. TRS R is terminating. 

2. There exists a weak reduction pair ( ^ , >) such that 
\/l ^ r G R. I ^ r and V(w, v) G DP{R). u > v. 

3. There exists a weak reduction pair ( ^ , >) such that 
yi r G R. I ^ r and V(w*, v*) G DP*{R). u* > v*. 

Proof. (1 2) We define s ^ thy s ^t, and s > thy C[t] for some C. Then, 

R R 

it is trivial that ( ^ , >) is a weak reduction pair such that yi ^ r G R. I ^ r 
and y{u,v) G DP{R). u > v. (2^3) It is easily shown by identifying f* with 
/. (3 1) This case has already been shown in [1,2, 3,4, 8]. □ 

Note that the proofs for (2 1) and for (1 3) have already been shown 

in [10,14] and [4] respectively. 

The above theorem shows that the weak reduction pair plays an important 
role. To design a weak reduction pair, the argument filtering method introduced 
in [4,8] is very useful, which is defined as recursive program schemata [9]. In the 
next definition, we introduce a new argument filtering method by combining the 
subterm relation, which is more effective in our framework than original one. 

Definition 35. An argument filtering function is a function tt such that for 
any f G S, 7t(/) is either an integer i or a list of integers [ii, . . . ,im\ (w > 0) 
where those integers i,ii, . . . ,im are positive and not more than arity(f). We 
can naturally extend tt over terms as follows: 

{ n{x) = X 

7r(/(ti , . . . , t„)) = TT{ti) if 7t(/) = i 

7r(/(ti,...,t„)) = /(7r(t*J,...,7r(ti^)) if 7r(/) = [fi, . . . , i„] 

For any argument filtering function tt and any reduction order >, we define the 
pair ( , > 7 r) as follows: 



^ The proof for (1 => 3) in [4] is based on the claim that if R is terminating then so 
is i? U DP*{R). However, the same proof method can not work well for (1 2), 

because the termination of R does not ensure that of i? U DP{R) [llj. 
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We hereafter assume that if 7r(/) is not defined explicitly then it is intended 
to be [l,...,arity{f)]. 

Example 36. Let t = /(e(e'(0, 1), 2), e"(3, 4, 5)), 7r(e) = 1, Tr(e') = [] and 
7r(e") = [1,3]. Then, Tr(t) = /(e', e"(3, 5)). 

Theorem 37. Let > be a reduction order and tt an argument filtering function. 
Then, the pair ( , >,r) is a weak reduction pair. 

Proof. We define the substitution n{9) as tt{9){x) = tt{9{x)). Then, the claim 
7r(t0) = n{t)n{9) is easily proved by induction on t. Using this claim, the stability 
of and > 7 r is easily proved. The other conditions are routine. □ 

Definition 38. We define the including relation C as follows: 

44 yi ^ r G Ri.3C. I ^ C[r] G R 2 

Theorem 39. Let R be a TRS, R' a terminating TRS and tt an argument 
filtering function. Lf 7r(i?) C R' and t:{DP(R)) C i?' then R is terminating. 

Proof. We define > by 4. The termination of R' ensure that > is a reduction 
r' 

order. Using the argument filtering method, we design the weak reduction pair 
( , > 7 r). It is obvious that Wl ^ r G R. I r and V(it,u) G DP{R). u v. 

From theorem 34, i? is terminating. □ 

Taking R and R' as a given TRS and a transformed TRS in elimination 
methods respectively, the above simple theorem can uniformly explain why elim- 
ination methods work well. This fact is very interesting because in the original 
literatures the soundness of these elimination methods were proved by compli- 
cated different methods. In the following sections, we will explain how theorem 
39 simplifies the requirement conditions in elimination methods into acceptable 
one. 

Corollary 310. Let R be a TRS, R' a simply terminating TRS and tt an argu- 
ment filtering function. Lf tt{R U DP{K)) C R' then R is terminating. 

Proof. As similar to theorem 39 by defining > as 4 . □ 

R'uEmb 

4 Argument Filtering Transformation 

In this section, we design a new elimination method, called the argument filtering 
transformation. This transformation is designed based on theorem 39, which is 
essential for elimination methods. 

Definition 41. (Argument Filtering Transformation) Letn be an argument fil- 
tering function. The argument filtering transformation (AFTt^) is defined as 
follows: 
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decT^{x) = 0 

rfec^(/(ti, . . . ,t„)) = U \Si=idec^{U) if Tr{f) = i 

dec,r(/(ti, . . . , tn)) = Ui^7r(/){^i} ^ Ur=i dec-„{ti) otherwise 

pickTr{T) = {t G T \ Irft) includes some defined symbols of R} 

where = Af) = ^ 

\ ’’■(Z) otherwise 

AFTtt{R) = tt{R) U {tt{1) Tr(r') \ I ^ r G R, r' G picfc^(dec 7 r(r))} 



Example 42. Let 

E={/(x,/(x,a:))^/(e(e'(0,l,2),3),e"(/(4,5),6)), 4 ^ 1, 5 ^ 1}. 

Here, DF{R) = {/, 4,5}. Let 7r(e) = [], Tr(e') = [1,3], 7r(e") = 2 and r = 
/(e(e'(0, 1, 2), 3), e"(/(4, 5), 6)). Then, we obtain AFTt^{R) as follows (Fig.l): 

7r(r) = /(e,6) 

dec^(r) = {e'(0,l,2), 1, 3, /(4,5)| 
pick{dec^{r)) = {/(4,5)| 

= {/(a;, f{x, x)) f{e, 6), 4^1, 5^1} 

AFT^(R) = n{R) U { f{x, f{x, x)) ^ /(4, 5) } 

The termination of AFTt^(R) is easily proved by recursive path order [5]. 
Thus, R is terminating, if the argument filtering transformation is sound. The 
soundness is showed in this section. Note that the termination of R is not easily 
proved, because R is not simply terminating. 



Lemma 43. Let C be a context and t a term. Then, there exists a context D 
such that D[Tr{f)] G Tr{decTr{C[t])) or D[Tr{f)] = 7r(C'[t]). 

Proof. We prove the claim by induction on the structure of C. In the case C = □, 
it is trivial. Suppose that C = /(ti, . . . , ti_i, C", ti+i, • . • , tn)- From induction 
hypothesis, there exists a context D' such that D'['K{t)\ G 7r(dec,r(C"[t])) or 
D'[7r(t)] = 7r(C'[t]). In the former case, it follows that D'[Tr{t)] G TT{decTr{C'[t])) 
C 7r(dec,r(C'[t])). In the latter case, if t = n{f) or i G n{f) then trivial. Otherwise, 
D'[Tr{t)] = 7r(C"[tj) G Tr{decT^{C[t])) from the definition of decT^. □ 

Theorem 44. Lf AFTt^(R) is terminating then R is terminating. 

Proof. From the definition, tt{R) C AFT,^{R). Let {u,v) G DP{R). From the 
definition of DP, there exists a rule u — > C[v] G R. From lemma 43, there exists 
a context D such that D[k{v)\ G Tr{decT^{C[v])) or D[k{v)\ = 7r(C'[r!]). In the for- 
mer case, from the definition of DP and n, root{Tt(v)) is a defined symbol. Thus, 
D['k{v)\ G 'K{pickTr{decT^{C[v\))). Therefore, it follows that •k{u) — > H[7r(t!)] G 
AFTt^{FC). In the latter case, it follows that tt{u) — > D[tt{v)] G tt{R) C AFtJ{R). 
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f 




dec;t(r) = {1, 012, 3, 4 5} 

f 

pick ( decjt (r) ) = { 4 5 } 

Fig. 1. Argument Filtering Transformation 



From theorem 39, R is terminating. □ 

From the proof of the above theorem, it is obvious that the second argument 
Tr(r') I Z — > r € i?, r' G pickT^{decTr{r))} of the definition of the argument 
filtering transformation AFT^^ is used only to keep information of dependency 
pairs. Thus, introducing redundancy context does not destroy the soundness of 
argument filtering transformation. Therefore, we can define another argument 
filtering transformation AFT^' (i?) as 

AFT^'iR) = 7T(i?) U {h ^ Ci[ri], ^ 

where {/i ^ ri, . . . , — > r„} = {tt{1) — > Tr(r') \ I ^ r G R, r' G picfc,r(dec 7 r(r))} 

and Ci denotes the list of contexts Ci, C 2 , . . . , C„. 

Corollary 45. If AFT^' (R) is terminating then R is terminating. 

5 Comparison with Other Eliminations 

For proving termination, several transformation methods, which simplify that 
task, have been proposed. As examples of such transformations the dummy 
elimination [G], the distribution elimination [13,16], the general dummy elimi- 
nation [7] and the improved general dummy elimination [15], were proposed. In 
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this section, we compare these elimination methods to the argument filtering 
transformation. As a result, we conclude that the argument filtering transforma- 
tion is a generalization of these elimination methods. 



5.1 Dummy Elimination 

Definition 51. (Dummy Elimination) [ij] Let e be a funetion symbol, ealled an 
eliminated symbol. The dummy elimination (DEg) is defined as follows: 






capeix) = X 


cape{e{ti, 


■ ■ ■ J tn)) — 


capeifiti, 


■ ■ -,tn)) = ficapeiti), . . .,Cape{tn)) 




dece{x) = 0 


deceie{ti, 


■ • -^tn)) = m^iiicapeiu)} U deceiu)) 


deceifih, 


■ • ■ > tn)) = ur=i deCeiU) if f ^ e 



— DEe{R) = {cape{l) — > r' I Z — > r S i?, r' € {cape(r)} U deCe{r)} 



Example 52. Let t = /(e(0, 5 ( 1 , e(2, 3))), 4). Then, capeft) = /(o,4) and 
deceit) = {0, 2, 3, 5 ( 1 ,^)} (Fig-^). 



t = 




capg (t) 



f 



o 4 



g 

/\ 

decg(t) = { 0, 2, 3, 1 o } 



Fig. 2. Dummy Elimination 
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Proposition 53. [6] If DEe{R) is terminating then R is terminating. 

For 7r(e) = [], we can treat the constant 7r(e(- • •)) as o. 

Theorem 54. For 7r(e) = [], AFTt^{R) C DEf,{R). 

Proof. It suffices to show that 7r(dec,r(^)) = dece{t) by induction on t. In the 
case t e V, it is trivial. Suppose that t = f{ti,...,tn)- In the case / 7 ^ e, 
7 r(dec^(/(ti, . . . , tn))) = U”=i 7 r(dec^(ti)) = Ur=i deceifi) = deceifih, ..., t„)). 
In the case / = e, 7 r(dec^(e(ti, ■ • -An))) = Ur=i U Tr{dec^{ti))) = 

{{capeiU)} U dece{ti)) = dece{e{ti, . . . ,tn)). □ 

This theorem means that the argument filtering transformation is a proper 
extension of the dummy elimination, because AFT,^{R) is terminating whenever 
DEe{R) is terminating. 

5.2 Distribution Elimination 
Definition 55. (Distribution Elimination) [16] 

A rule I —>■ r is a distribution rule for e if I = C[e{xi, . . . ,Xn)] and r = 
e(C[xi], . . . ,C[xn]) for some non-empty context C in which e does not occur 
and pairwise different variables x\^ . . . ,Xn- Let e be an eliminated symbol. The 
distribution elimination (DISe) is defined as follows: 

( {0 ift G V 

- Ee(t) = U”=l EeiU) ift = e{ti, ...An) 

[{f{si,. . . ,Sn) \ Si € Ee{ti)} ift= f{ti,...,tn) with f ^ e 

— DISe{R) = {I ^ r' \ I ^ r £ R is not a distribution rule for e, r' G ife(r)} 



Example 56. Let t = /(e(0, g(l, e(2, 3))), 4). 

Then, Ee(t) = {/(0,4), /(g(l,2),4), /(5(1,3),4)} (Fig.3). 

In general, the distribution elimination is not sound with respect to termina- 
tion, i.e., termination of DIS^{R) does not ensure termination of R. Thus, the 
distribution elimination requires suitable restrictions to ensure the soundness. 

Proposition 57. [13,16] 

(a) If DISe{R) is terminating and right-linear then R is terminating. 

(b) If DISe{R) is terminating and non-constant symbol e does not occur in the 
left-hand sides of R then R is terminating. 



Lemma 58. Let 7r(e) = 1. Under the condition (b), for any I ^ r £ AETt^{R), 
there exists a context C such that I C[r\ G DISe{R). 
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t = 



f 




e 4 



0 g 

/\ 

1 

2 3 



Ee(t) 





} 



Fig. 3. Distribution Elimination 



Proof. From the definition of AFT^^, for any I ^ r G AFTt^{R) there exists a 
rule V — > C'[r'] G R with / = and r = Tr{r'). Thus, it suffices to show that 
for any t and C", there exists a context C such that C'[7r(t)] G Ee{C'[i\). It is 
easily proved by induction on C". □ 



Theorem 59. Let 7r(e) = 1. Under the condition (b), the following properties 
hold. 



(i) AFT^'{R) C DISe{R) for some C,. 

(ii) If DISe{R) is simply terminating then AFTt^{R) is simply terminating. 
Proof, (i) It is trivial from lemma 58. 

(ii) From the assumption and (i), AFT^'(R) is simply terminating. Thus 
AFTt^{R) is simply terminating since AFTt^{R) C AFT^' (R). □ 



5.3 General Dummy Elimination 



For any e G U, a,n e-status r satisfy r(e) = (0,0) or (/, *) with i G F 
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Definition 510. (General Dummy Elimination) [7] 

Let e he an eliminated symbol and r(e) = The general dummy elimination 

(GDEf.) is defined as follows: 



capift) = 



f ^ 

I f {^capi{t\f . . . , capiftjffj 

1 capi{ti) 


iftGV 
ift = f{h,.. 
ift = e{ti, . . . 
ift = e{ti, . . . 


• ; ^n) ^ f ^ ^ 
,tn) Ai^ 0 
1 ^n) A i = 0 


w 

{/(si; • ■ ■ ! ®ra) 1 Sj G Ei{tj)} 
. E{U) 


III III 


• ■ - ,^n) A / / 

• ■ 5 ^n) 


{t} ift G V 

{capo{t)} if I = 0 

[j,^jE,it) ifl^% 






[0 

\Jl=,dec{t,) 

1 VTj=i dec{tj) U E{tj) 


III III 


■ • A f ^6 

■ 5 ^n) 



GDEe{R) = {capi{l) ^ r' \ I ^ r G R, r' G E{r) U dec{r)} 



Example 511. Let t = /(O, e(/(l, e(2, 3,4)), 5, 6)) and r(e) = ({1,3},!). 

Then, E(t) = {/(O, 6), /(0,/(l,2)), /(0,/(l,4))} and dec{t) = {5, 3} (Fig.f). 

Proposition 512. [7] If GDEf,[R) is terminating then R is terminating. 

Theorem 513. Let r(e) = (/, i). In the ease r(e) = (0,0), we define 7r(e) = []. 
In the ease r(e) = (I,i) with i G I , we define 7r(e) = i. Then, the following 
properties hold. 

(i) AFT^'[R) C GDEe{R) for some Ci. 

(ii) If GDEe(R) is simply terminating then AFTt^(R) is simply terminating. 

Proof, (i) In the case r(e) = (0,0), it is trivial that DEe{R) = GDEe{R). Thus, 
AFTt^{R) C GDEg{R). In the case r(e) = (/, i) with i G I, as similar to theorem 
59 (i) by replacing Ee{r) with dec{r) UE{r). (ii) As similar to theorem 59 (ii). □ 

We give the two following examples that the argument filtering transforma- 
tion can be applied to, but the general dummy elimination can not be; the former 
is for showing that removing the unnecessary rules is effective and the latter is 
for showing that we can take well a defined function symbol as an eliminated 
symbol. 
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f 



0 

t = f 5 6 




E (t) 




f 




f 




} 



dec ( t ) = { 5 , 3 } 



Fig. 4. General Dummy Elimination 



Example 514. Consider TRS 



R = 



/(a) ^ m 

b g{a) 



Let TT{g) = []. Then, 



AFT^(R) 



f{a) f{b) 
b ^ o 



The termination of AFTt^(R) is easily proved by reeursive path order [o]. We 
easily observe that the dummy elimination, the distribution elimination and the 
general dummy elimination can not be applied. Indeed, the following systems are 
clearly not terminating. 



T{g) 


GDEg{R) 




(0,0) 


f{a) ^ f{b) 
b^o 
b ^ a 


(= DEg{R)) 


({!},!) 


/(«) ^ fib) 

b ^ a 


(= DISg{R)) 



Note that termination of R is not easily proved since R is not simply terminating. 
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Example 515. Consider TRS 



1 9{x,y) y 



Let Tr{g) = [2]. Then, 



( f{g{x)) 

AFT^(R) = I fifix)) ^ fix) 

[ g{y) y 

The termination of AFRiR) is easily proved by recursive path order. We eas- 
ily observe that the general dummy elimination can not be applied. Indeed, the 
following systems are clearly not terminating. 



T{g) 


GDEf,{R) 


(0,0) 


fifix)) /(o) 
f if ix)) f ix) 

f if ix)) X 

o^y 




f{f{x)) fix) 
f if ix)) X 
x^y 


({2}, 2) 


fifix)) fix) 
y^v 


({1,2},!) 


fifix)) fifix)) 
f if ix)) f ix) 
x^y 


({1,2}, 2) 


fifix)) fifix)) 
f if ix)) f ix) 
y^y 



The dummy elimination and the distribution elimination cannot be applied, too. 
Note that termination of R is not easily proved since R is not simply terminating. 

5.4 Improved General Dummy Elimination 

Definition 516. (Improved General Dummy Elimination) [15] 

The functions capi, E and dec are the same as that of the general dummy elim- 
ination. In the case e € DF{R), we take IGDEe(R) = GDEe(R). Otherwise, 

— E'{t) = {s S Eft) I s includes some defined symbols of R} 

— dec' ft) = {s G dec{t) \ s includes some defined symbols of R} 

— IGDEe(R) = {capi{l) ^ r' \ I ^ r G R, r' G {capi(r)} U E'fr) U dec' (r)} 
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Proposition 517. [15] If IGDEe{R) is terminating then R is terminating. 



Theorem 518. Let r{e) = (I,i)- In- the case r(e) = (0,0), we define 7r(e) = []. 
In the case r(e) = (I,i) with i G fi we define 7r(e) = i. Then, the following 
properties hold. 

(i) AFT^fiR) C IGDEe{R) for some Ci. 

(a) If IGDEg(R) is simply terminating then AFTt^(R) is simply terminating. 
Proof. As similar to theorem 513 □ 

At the end, we give an example that the argument filtering transformation 
can be applied to, but other elimination methods discussed here can not be. 

Example 519. Consider TRS 

n ^ / fifix)) f{g{f{x), a;)) 

\f{9{x,y)) f ly) 

Let 7r(g) = [2]. Then, 

AFT^{R) = {fif{x)) f{g{x)), fif{x)) f{x), f{g{y)) f{y)}. 

The termination of AFTt^(R) is easily proved by recursive path order. We easily 
observe that the improved general dummy elimination ean not be applied. Indeed, 
the following systems are clearly not terminating. 



x{g) 


IGDEfiR) 


(0,0) 


fifix)) /(o) 
f lf lx)) f ix) 

/(o) ^ f{y) 


({i},i) 


fifix)) fix) 
fix) fly) 


({2}, 2) 


fifix)) fix) 
fiy) f iy) 


({1,2},!) 


fifix)) fifix)) 
f if ix)) f ix) 
fix) fly) 


({1,2},2) 


fifix)) fifix)) 
f if ix)) f ix) 
fiy) fly) 



The dummy elimination, the distribution elimination and the general dummy 
elimination eannot be applied, too. Note that termination of R is not easily proved 
sinee R is not simply terminating. 
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Abstract. We present a simple and powerful calculus of modules sup- 
porting mutual recursion and higher order features. 

The calculus allows to encode a large variety of existing mechanisms for 
combining software components, including parameterized modules, ex- 
tension with overriding of object-oriented programming, mixin modules 
and extra-linguistic mechanisms like those provided by a linker. 

As usual, we first present an untyped version of our calculus and then 
a type system which is proved sound w.r.t. the reduction semantics; 
moreover we give a translation of other primitive calculi. 



Introduction 

Considerable effort has been recently invested in studying theoretical foundations 
and developing new forms of module systems; let us mention the wide literature 
about foundations and improvements of Standard Mb’s module system [18] (see 
e.g. [16,14]), the notions of mixins (see e.g. [7,11,13] and our previous work [-5]) 
and units [12], the type-theoretical analysis of recursion between modules pro- 
posed in [10]. 

Two principles which seem to emerge as common ideas of all these approaches 
are the following. 

First, a module system should have two linguistic levels, a module language 
providing operators for combining software components, constructed on top of 
a core language (following the terminology introduced with Standard ML) for 
defining module components. The module language should have its own typing 
rules and be as independent as possible from the core language; even more, it 
could be in principle instantiated over different core languages (see [17] for an 
effective demonstration) . 

Second, the modules should actually correspond to compilation units, and 
typing rules of the module language should formalize the inter-check phase de- 
scribed in [8]. Note that, indeed, operators of the module language could also 
correspond, in practice, to an extra-linguistic tool like a linker. 

* Partially supported by Murst - Tecniche formali per la specifica, I’analisi, la verifica, 
la sintesi e la trasformazione di sistemi software and CNR - Formalismi per la 
specifica e la descrizione di sistemi ad oggetti. 

** The final version of this paper was produced while visiting Oregon Graduate Insti- 
tute, Portland, OR, USA. 



G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 62-79, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 
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In this paper, we define a primitive module calculus based on these two 
principles and suitable for encoding various existing mechanisms for composing 
modules, in the same way as A-calculus provides a theoretical basis for functional 
languages; in particular it supports mutually recursive modules and higher-level 
features (modules with module components), and it is parametric in the under- 
lying core language. 

A basic module of this calculus is written, using some syntactic sugar and 
considering here for simplicity the untyped version, as follows: 

import Xi as Xi , . . , Xm as Xm 
export as Vi , . . , En as Yn 
z\ — E\ , • • , Zp — Ep 

We write in upper-case names of the components the module either imports 
from {input components Xi, Xm) or exports to {output components Yi, Y„) 
the outside. We write in lower-case variables used in definitions inside the module 
(the expressions E\, .., E[, .., E'^, which can be expressions of the core 

language or in turn module expressions if the module has module components) . 
These variables can be either deferred {x\, .., Xm), be. associated with some input 
component, or locally defined {z\, .., Zp). This distinction between component 
names and variables is essential for keeping the module independent from the 
core level, as will be explained in more detail later. 

Now, as example of a typical operator which can be easily encoded in our 
calculus, consider an operator link used for merging two or more modules. This 
operator can be thought as either an operation provided by a module language in 
order to define structured module expressions or an extra-linguistic mechanism 
to combine object files provided by a tool for modular software development. 
Independently from the view we take, we can informally define this operator as 
follows. For any pair of modules Mi and M2, link{Mi, M2) is well-defined if (a) 
the set of the input components of Mi (resp. M2) is included in the set of the 
output components of M2 (resp. Mi); (b) the sets of the output components of 
Ml and M2 are disjoint. 

If the conditions (a) and (b) hold then link{Mi, M2) corresponds to a module 
with no input components (called a concrete module) where each input compo- 
nent of one module has been bound to the definition of the corresponding output 
component of the other module. 

For instance, let the modules BOOL and INT define the evaluation of some 
boolean and integer expressions in a mutually recursive way: 



module BOOL is 
import IntEv as ext_ev 
export ev as BoolEv 

fun ev(be)=if kind(be)==EQ then ext_ev(lhs (be) )==ext_ev(rhs (be) ) 
else if ... 

fun lhs(be)= . . . ; fun rhs(be)= . . . 
fun kind (be) = . . . 
end BOOL; 
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module INT is 
import BoolEv as ext_ev 
export ev as IntEv 
fun ev(ie)=if kind(ie)==IF then 

if ext_ev(cond(ie) ) then ev(ifBr(ie)) else ev(elseBr (ie) ) 
else if ... 

fun cond(ie)= . . . ; fun ifBr(ie)= . . . ; fun elseBr(ie)= . . . 
fun kind(ie)= . . . 
end INT; 

then Zmfc(BD0L, INT) intuitively corresponds to the module 

module B00L_INT is 
export iev as IntEv 
export bev as BooIEv 

fun bev(be)=if bkind(be)==EQ then iev(Ihs (be) )==iev(rhs (be) ) 
else if ... 

fun lbs (be) = . . . ; fun rhs(be)= . . . ; 

fun bkind(be)= . . . 

fun iev(ie)=if ikind(ie)==IF then 

if bev(cond(ie) ) then iev(ifBr (ie) ) else iev(elseBr (ie) ) 
else if ... 

fun cond(ie)= . . . ; fun ifBr(ie)= . . . ; fun elseBr(ie)= . . . ; 
fun ikind(ie)= . . . 
end B00L_INT; 

Note that the separation between component names and variables allows one 
to use the same identifier ev for the evaluation function in the two modules. 

In the following, we define a simple language where module expressions are ei- 
ther basic modules which are, apart from syntactic sugar, those described above, 
or constructed by three operators (sum, reduct and freeze); moreover, a selec- 
tion operator allows one to extract a module component (Sect. 1.1). In Sect. 1.2 
we define a reduction semantics for the language. In Sect. 2 we define a typed 
version of the calculus. In Sect. 3 we illustrate how various existing constructs 
for composing modules can be encoded in the calculus, analyzing in particular 
the link operator shown in this introduction (Sect.3.1), parameterized modules 
(Sect. 3. 2) and object-oriented features (Sect. 3. 3). Finally, in the Conclusion we 
summarize the contribution of the paper and outline further work. 



1 An Untyped Calcnlus 

1.1 Syntax 

The abstract syntax of the untyped calculus is given in Fig.l. 

Lower case meta-variable x ranges over an infinite numerable set Var of 
variables, whereas upper case meta- variables X and Y range over an infinite 
numerable set Name of component names. This distinction at the level of the 
calculus reflects, at more practical level, the separation that a linker makes be- 
tween internal names (what we call variables) and external names (what we call 
component names). 
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E 


■- K \ M 




K 


:= C 


(plain core expressions) 




C[xi 1 — > Ei] 


(enriched core expressions) 


M 


■■= [t; o; p] 


(basic module) 




El -I- E2 


(sum) 




cr'- 1 Ejo-o 


(reduct) 




freeze (E) 


(freeze) 




E.X 


(selection) 


L 


■= xf^Xi 


(t- assignment) 


0 


:= Xf^Ei 


(o-assignment) 


P 


i€.I j—f 

:= Xi^Ei 


(p- assignment) 


a 


■= Xf^Yi,Yf^^ 


(renaming) 



Fig. 1. Abstract syntax of the untyped calculus. 



The meta- variable E ranges over the set of all expressions (denoted hy £e) 
containing both the set of module expressions (denoted by £m) and of core 
language expressions possibly having module sub-terms (denoted by £k)- 

The meta-variable C ranges over the set of pure core expressions (denoted 
by £c)- The syntax is parametric in £c', we assume that Var C £(j. 

In the production K ::= C[xi^Ei\ the substitution symbol is used at the 
meta-level, i.e. C[xii^Ei\ denotes the expression obtained from the core ex- 
pression C by the usual capture-avoiding substitution of expressions Ei for free 
variables Xi {i G /), enjoying all the standard properties. Expressions of this 
kind are needed for the (selection) reduction rule (see Fig. 2) which otherwise 
would not be well-defined. We require that this production can be applied only 
under the conditions C ^ Var, / 0 and Xi G FV{C) for all i G I, in order to 

rule out some trivial and redundant case. 

The independence of the calculus from the core language is effective, in the 
sense that reduction and typing rules we will provide are constructed on top 
of those of the core language, so that a type-checker or an interpreter for the 
module language could be constructed in a modular way enriching one for the 
core level, as done in [17]. The prototype we have developed for the calculus is 
actually built following this idea (see the Conclusion). 

A basic module corresponds to the ability of building a module by collect- 
ing a set of components. A basic module is made up of an assignment of input 
names to deferred variables (also called i-assignment), of expressions to output 
names (also called o-assignment) and of expressions to local variables (also called 
p-assignment or substitution); all these assignments have a scope that is indi- 
cated by the square brackets delimiters. The notation Xii^Xi is used for rep- 
resenting the unique surjective and finite map l s.t. dom{i) = {xi \ i G /}, 
cod{b) = {Xi \ i G 1} and b{xi) = Xi for all i G I. Notice that in opposition to 
the meta-substitution used for defining K, here the finite set of indexes I can be 
empty. We assume that for any ii and i 2 in I, if ii i 2 then Xi^ Xi^. A simi- 
lar notation is used also for the other kinds of assignments. Finally, we assume 
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that the set of deferred and local variables are disjoint {dom{i) n dom{p)). On 
the contrary, the sets of input and output components can have a non empty 
intersection. Finally, note that the calculus supports higher order modules. 

For instance, the basic module: 

[ext-ev I— > IntEv; BoolEv i— *■ ew; ev e, Ihs rhs e- > ..., kind >—>.•.] 

with 

e = A&e.if kind{be) == EQ then ext-ev{lhs{be)) == ext ^ev {rhs (be)) else if... 

corresponds to the module BOOL defined in the Introduction. 

As already mentioned, there exist several (both technical and methodologi- 
cal) motivations for keeping component names separated from variables. 

Technically speaking, variables can be a-converted, in the sense that we can 
rename (in an appropriate way) the variables of an expression e without changing 
the observable semantics of e. The same cannot be done for component names 
(see Sect. 1.2). Furthermore, if we want the module calculus to be independent 
from the core level, then component names have to be necessarily independent 
from the variables of the core language. 

Methodologically speaking, this separation is a way of abstracting from the 
particular programming language a module comes from, even allowing composi- 
tion of heterogeneous software components; variables correspond to the particu- 
lar dialect spoken inside each module, whereas names represent a sort of lingua 
franca which allows modules to talk each other. 

Analogous distinctions are those between program variables and labels that 
connects fragments in [14], those between variables and field/method names 
in the Abadi and Cardelli’s object calculus [1] and those between names and 
identifiers in [16]; also in MzScheme’s units [12] imported and exported variables 
have separate internal (binding) and external (linking) names, and the internal 
names within a unit can be a-renamed. 

Modules can be merged together by means of the sum operator. 

The reduct operator is a powerful form of renaming of the component names; 
input and output components are separately renamed via two renamings (see 
below) cr'' and cr°, respectively, which are two finite maps over Name. 

The freeze operator allows the binding between input and output names; this 
binding is specified by the renaming . 

Finally, it is possible to access an output component from the outside via the 
selection operator. 

The meta-variable a ranges over the set of renamings (finite maps over 
Name). The notation is used for representing the unique map 

cr s.t. dom{a) = {Xi | * G /}, cod(a) = {Yi | z G / U J} and a{Xi) = Yi, for 
all i G I. We assume that for any and *2 in if *i 7 ^ *2 then Xi^ yf Xi^ 

and, similarly, for any ji and j 2 in J, if ji yf j 2 then yf Yjy,. Furthermore, 
{Yi I i G /} and {Yj \ j £ J} are assumed to be disjoint sets. 

We introduce the following abbreviations for the reduct: if cr‘ is an inclusion, 
i.e. of the form then is written cod{<r‘-)\E\cr'>', if in particular 

J = 0, i.e. a’' is the identity, then we simply write E\^o. Symmetrically, if cr° is 

of the form A/l^Ai, then is written a‘-\E\dom{a'^) and, if cr° is the 

identity, then we simply write o-qA'- 
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Notations and Definitions: for any module expression E, let FV{E) denotes the 
set of free variables of E inductively defined by: 

FV (C) is core language dependent 

FV{C[xf^Ei] = {FV{C) \ {li I i G /}) U Uig, FV{Ei) 

FV{\i-, o; p\) = (UsecodW ^ U^;6cod(p) PV{E)) \ {dom{i) U dom{p)) 

FV{Ei + E 2 ) = FV{Ei) U FV[E2) FV{„.\E^^o) = FV{E) 

FV {freeze ^f(E)) = FV{E) FV{E.X) = FV{E) 

As expected, at the module level the only binding construct is for basic modules. 
If if = [i; o; p] then we denote by BV{E) the set dom{b) U dom{p) of its binding 
variables; finally, we define V (E) to be BV{E) U FV{E) and AV{E) to be the 
set of all variables in E. 

We extend the notation for substitution to p-assignment, i.e. E[p] denotes 
the expression obtained by capture-avoiding simultaneous substitution of Ei for 
the free occurrences of Xi in E, for all i £ E Finally, if p' is another p-assignment, 
then p[p'] denotes the p-assignment Ei[p']; an analogous notation is used for 
o- assignments. 



1.2 Reduction Rules 

The reduction rules for the untyped calculus are defined in Fig. 2 . 



(core) 

(al) 

(o2) 

(sum) 

(reduct) 

(freeze) 

(selection) 



C^C 

C^C 



(sub) 



Mi Ei 



C[xf^Mi] ^ C[xf^Ei] 



i e I 



[x X, L\ o; p] [x' I— > X, t; o[x 1— > x']\ p[x 1— *■ x'] 



[b\ o; X E,p]^ [t; o[x 1— » x']\ x' 1— > E[x 1— » x'],p[x 1— » x' 

Ei = [ti; Oil Pi]; f = 1, 2 

BV{Ei)n V{E 2 ) = 9 

El + E2 ^ [bi, b2\ 01,02; pi,p2] BV{E2)nV{Ei) = % 

dom{oi) n dom{o2) = 0 

cod{b) C dom{a'') 



x' bf AV{E) 

x' ^ AV{E) 



yL\\b\ o; p],^o [cr'- o t; oocr°; p] cod{a°) C dom{o) 



cod(bi) n cod{b2) = I 
dom{a^) = cod{bi) 



freeze,! {\bi,b2i o; pj) ^ [12; o-, p,ooafo n] ^ 



[; o; p].X o{X)[xii-^ Ei.Y] X € dom{o) 



V i £ I Ei = [,Y ^ p{xi)\ p] 
dom{p) = {xi \ i £ 1 } 



Fig. 2. Reduction rules for the untyped calculus. 
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Values are all basic module expressions together with all core values. As 
usual, besides the rules of Fig. 2, it is implicitly defined also the rule for contexts 
closure. The following definition of context determines the reduction strategy we 
refer to in this paper: 

C[]y.= []\C[] + E\E + C[]\,.f[]^^.\freeze,f{C[])\C[].X 

Core: the definition of the reduction relation for the module calculus is para- 
metric in the core reduction relation the meaning of the (core) rule is obvious. 

Substitution: the (sub) rule is needed for reducing enriched core terms to 
plain core terms, so that eventually the core reduction relation can be applied. 
The use of meta-variable M rules out all non interesting cases; this simplifies 
the proof of subject reduction (Theorem 1). Finally, because of our assumptions 
over the syntax, recall that C ^ Var, I and Xi G FV (C) for all i G /. 

a-conversion: for simplicity, the a-rule for the binding construct of basic 
modules has been split into two rules which separately deal with deferred and 
local variables renaming, respectively. 

Note that, as mentioned before, the separation between variables and com- 
ponents is essential for having a correct a-rule; indeed, the a-conversion makes 
sense for variables but not for names. For instance, the term if = [; V = 0; ] 
is not observationally equivalent to E' = \\ X = 0; ] since these two terms 
clearly behave differently w.r.t. the context C[ ].V (or, equivalently, C \ ].AT): E.Y 
reduces to 0, whereas the reduction for E' .Y gets stuck. It should be clear from 
this example that the crucial point stands in recognizing which are the entities 
which can be correctly a-converted rather than in the separation between Var 
and Name: technically, these two sets could be equal, however it is better to 
keep them separated since, conceptually, variables and components are different 
entities. 

Sum: the reduction rule for sum is straightforward; this operation has simply 
the effect of gluing together two modules. However, a particular attention is 
needed for correctly applying this rule. The binding variables of one module must 
be disjoint from those of the other, otherwise the result of the sum would not 
be syntactically correct (recall the assumptions for basic modules in Sect. 1.1). 
Furthermore, we have to pay attention that the free variables of one module are 
not captured by the binding variables of the other. As a result, the set of binding 
variables of one module must be disjoint from the set of (either binding or free) 
variables of the other. If this does not happen, then the (sum) rule can be applied 
only after an appropriate a-conversion of (possibly) both the modules. 

The output components of the two modules must be disjoint for the same 
reason explained for binding variables; however, in opposition to what happens 
for binding variables, if this condition does not hold for output components then 
the reduction gets stuck since this conflict cannot be resolved by an a-conversion. 
The only way to solve this problem is to explicitly rename the output components 
in an appropriate way by means of the reduct operator, thus changing the term. 
The sets of the input components of the two modules can have a non empty 
intersection and the resulting set of the input components of the sum is simply 
the union of them; this means that the input components having the same name 
in the two modules are shared in the resulting sum. 
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Finally, note that sum represents a very primitive way of assembling together 
two modules, since it provides no way for inter-connecting their components 
(apart from the fact that input components are shared^). This can be done only 
at a second stage, after sum has been performed, by means of the freeze operator 
(see below). In other words, sum corresponds to the ability of collecting pieces 
of unrelated code. 

Reduct: the reduct operator performs only a renaming of component names 
and does not change the /9-assignment and the variables of a module; its effect 
is simply a composition of maps. However this form of renaming is rather pow- 
erful. Input and output names are renamed in a separate way, by specifying two 
renamings cr'' and cr°, respectively^. The two renamings are contravariant for the 
same reason that a function from A to B can be converted into a function from 
A! to B' whenever two conversion functions from A to A and from B to B' are 
provided. Note that the two renamings can be non-injective and non-surjective. 
A non-injective map A allows sharing of input components, whereas a non- 
surjective one is used for adding dummy input components (in the sense that no 
variable is associated with them); a non-injective map a° allows duplication of 
definitions, a non-surjective one is used for hiding output components. 

Finally, note that the syntactic representation chosen for i-assignments is 
not suitable for expressing non-surjective maps, although composition of such 
assignments with non-surjective renamings may produce non-surjective assign- 
ments. Hence, we represent a non-surjective assignment by associating a fresh 
variable with each input component which is not reached in i. For instance, the 
term {x.w}\ [x X; Y x -I- 1, Z I— > 1; reduces to[x ^ X,w ^ W] Y ^ 
X -|- 1; ], where rc is a fresh variable. 

Freeze: as already stated, the freeze operator is essential for binding input 
with output components in order to accomplish inter-connection between com- 
ponents. In other words, freeze corresponds to the phase, typical of any linker, 
of external names resolution which immediately follows the merge of the object 
files. However in this case the resolution is neither implicit nor exhaustive. A 
renaming explicitly specifies how the resolution has to be performed, asso- 
ciating output to input components; furthermore, the domain of can be a 
proper subset of all input components of the module so that the resolution is 
partial. 

The effect of applying the freeze operator is that all input components that 
are resolved (represented by the set dom{u^)) disappear and the deferred vari- 
ables mapped into them (represented by the set dom{Li)) become local. These 
variables are associated with the definition of the output component to which 
they are bound by cr-f (i.e., o(cr^(ti(x))), for all x G dom{L\)). 



^ We could avoid implicit sharing of input components in the (sum) rule by requiring 
dom(Li) n dom{L 2 ) — 0, thus forcing the user to make this sharing explicit by means 
of the reduct operator. 

^ Indeed in the primitive calculus there exists no relationship between the names of 
the input and output components and the fact that these two sets of names may 
not be disjoint has no semantic consequence; we will consider later (Sect. 3. 3) how 
to encode in the calculus module systems with virtual, i.e. both input and output, 
components. 
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The deferred variables and the input components which are not resolved 
(represented by dom{b 2 ) and 00 ^( 62 ), respectively) and the o-assignment are not 
affected. 

As an example, the module expression 
freeze p^Q{[f 1 -^ F, k 1 -^ K-, G 1 — > A®. if x = 0 then 1 else k * f{x — 1); ]) 
reduces to 

[k K-, G H- > e; f e] 

with e denoting the expression Ax. if x = 0 then 1 else k * /(x — 1). 

Selection: finally, output components can be accessed from the outside by 
means of the selection operator. Selection is legal only for modules where all 
input components have been resolved (called concrete modules), hence, for all 
modules having an empty t-assignment. Furthermore, the selected name must 
be in dom(o) (i.e., it must be an output component of the module). 

Since definitions in modules can be mutually dependent, the expression cor- 
responding to the selected component (determined by o(A)) may contain some 
(necessarily local) variables {xi \ i G 1} which have to be replaced with their 
corresponding definition. Therefore, for each i G I, the variable Xi is replaced 
with the term E^.Y, where Ei = [; Y 1 -^ p{xi); p] is equal to the module E 
upon which selection is performed, with the exception that only the Xi variable 
is made visible via the component name Y (since we are interested in selecting 
only the expression assigned to x^); this variable must be visible even though it 
is local in E, since the definition of an output component has free access to the 
local variables of its module. Note that recursion is obtained by propagating the 
p-assignment of E in the resulting term by means of the substitution Xi^Ei.Y . 
As an example, the module expression 

[; G I— > g; k i—> 2, g i—> Ax. if x = 0 then 1 else k * g(x — 1)].G 
reduces to 

Ax. if X = 0 then 1 else Ei.Z * E 2 .Z[x — 1) 

where Ei, E 2 and e denote [; Z 2; k 2, g e], [; Z e; k 2, g e] 
and Ax. if x = 0 then 1 else k * g{x — 1), respectively. 

2 A Typed Calculus 

In this section we address the problem of defining a sound type system for the 
calculus presented in Sect.l. As usual, soundness means that the reduction of 
each closed and well-typed term never gets stuck, so that it is possible to stat- 
ically detect errors like clashes of output components while summing modules, 
binding of deferred variables to expressions of the wrong type, selection of output 
components not present in a module and so on. 

Since here we are mainly interested in type checking rather than in type 
inference algorithms, the terms of the typed calculus are decorated with types 
so that they are slightly different from those of the untyped calculus. 

The type system we define turns out to satisfy the subject reduction property 
(see [6] for technical details) ; we conjecture that a proof of the soundness of the 
type system can be obtained by adapting the proof for subject reduction. 
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The types of the calculus are defined by r ::= cr | 

A type is either a core type cr (i.e., a type of the core language) or a module 
type (abbreviated by [S‘‘\ S°]). For sake of simplicity we do 

not introduce recursive types. Notice that according to the definition above core 
types cannot be built on top of module types, hence we are forcing the core and 
module language to be stratified so that modules are not first-class values. See 
the Conclusion for a discussion about this restriction. 

A module type is well-formed if it contains well-formed types and input (resp. 
output) components are not overloaded. This is formalized by the judgment 
h [A''; A°] defined by the following rules: 



h [A‘; E°] 



\- Ti 'ii £ I 
hXi-.n'^^ 



Xi'^’ distinct 



Intuitively, if a module M has type [Xi-.n'^’] Xj-.Tj^'"'’] then {Xi \ i £ 1} and 
{Xj \ j € J} represent its sets of input and output components, respectively. 

The type annotation Xi'.n says that the input component Xi can be cor- 
rectly bound only to expressions of type Tj, whereas Xj-.Tj says that the output 
component Xj is associated with an expression of type tj . 

The syntax of the typed calculus is the same as that of the untyped version, 
apart from basic modules where deferred and local variables are decorated with 
types: [xi'.n'i^Xi; X/^Ej] Xk-Tk’’'^ E^]. 

The type decoration must be coherent in the sense that if Xi£.Ti^^ Xi^-.Ti^ and 
i{xi^) = L{xi^) then Ti^ = Ti^ for any pair of deferred variables Xij, Xi^^ so that 
the type of the module is well- formed. 

For instance, the module 

\f-.int — > int i— > F, k-.int i— > A; G Xx-.int.ii x = Q then 1 else k * f{x — 1); ] 

has type [F:int —>■ int,K:int' G'.int int]. 

The typing rules for the typed calculus are defined in Fig. 3. 

A context T is a finite (possibly empty) sequence of assignments of types 
to variables where variable repetition is allowed. The predicate E{x) = r is 
inductively defined as follows: 

— 0(x) = T is false for any variable x and type r; 

— ( T , x:t){x') = t' iff {x = x' and r = r') or {x yf x' and E{x') = r'). 

The predicates x £ E and FEE' are defined as follows: x £ E iS there exists 
T s.t. E{x) = r; F C T' iff for any variable x and any type r, E(x) = r implies 
E'{x) = T. 

The (core) typing rule expresses the dependence from the core type system; 

c 

core typing judgments have form E h C:ct, where F is a context containing 
only core types, C is a (plain) core expression and cr a core type. The side 
condition F C F' is essential for eliminating the part of the context which is not 
well- formed at the core level. For instance, when trying to prove 

0h[a;i-^A; V^x.Y + V, ]:[A:[; Y-.int]-, V'.int], 

we end up with proving x\[, Y\int], y.int y + I'.int, which can be derived at 
the core level only if we get rid of the assumption x\[\ Y\int]. 
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(core) 



(sub) 



r \- C'.cT 

r> h C-.cr 



rc r' 



h C-.cr, r h Ei-.Ti 
r h Clxi'i^Ei] : cr 



Vie i.xi ^ r 



(basic) 



h [S'-- S°] 

r,xi-.Ti^'^\xk.Tk'^^‘^ h Ef.Tj Vj e J 
r,Xi-.Tj'^\xk-.Tk’‘^^ V- Ek-.Tk'ik£ K 

r'r[xi-.TiSXi-, X/>^Ej- Xk.Tk"^Ek]:[S^; S°] 



{S'} = [Xi:Ti'^^} 
S° = 



(sum) 



h [S' , S}, S2, 



S},S°2\, rhEi:[r\r(; rf], r'^ E2 -.[s',s'2-, 
E \~ El + E 2 [S ' , S { , S 2 ; E'C , ^" 2 ] 



(reduct) 

(freeze) 

(selection) 



r h E:[r‘; r°] 

rh,.|E|,o:[r'^ s'°] 

r\- E:[S^,S'- S°] 
r \- freeze ^f{E):[S'-, S°] 

rhE:[; 

E h E.Xk-Tk 



a'-.S'^ S"- 
a°: S'° S° 

a^-.Sf S° 
k£j 



Fig. 3. Typing rules for the typed calculus. 



The (sub) typing rule corresponds to a substitution lemma for enriched core 
expressions. Recall that C ^ Var, / yf 0 and Xi e FV{C) for all i S /; the 
side-condition says that no type assumptions are needed for the variable we 
substitute for. 

In the (basic) typing rule the side condition {S'} = {Xi.Ti"^'} means that 
for any i G I, Xi'.Ti G S' and for any X-.t G S', X:t G Xi'.Ti'^' (recall that there 
might be repetitions in {Xi'.Ti'^'}). 

The (sum) typing rule allows sharing of input components having the same 
name and type (represented by S'), whereas output components cannot be 
shared. The notation Si, S 2 denotes the concatenation of S\ with S 2 - 

The side conditions having form a: Xi'.Ti'^' Xj-.Tj'’^'^ ensure that the re- 

naming a preserves types (see typing rules (reduct) and (freeze)); formally, this 
means that tr: {Xi \ i G 1} ^ {Xj | j G J} and a{Xi) = Xj ^ n = tj for all 
iGl,jGJ. 

The reduction rules for the typed calculus are simply the rules of Fig. 2 an- 
notated with types. 

In order to prove subject reduction, we need some assumptions over the core 
language on top of which the module language is defined. 

1. (No Interference ) £m H £k = 0 and if Xi G FV{C) for all i G I, then 

C[xit-^Ei] G £c Vi G I, Ei G £c- 

2. (Weakening) If E h C:ct and F C F' {F' core contex) then F' h C:ct. 
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3. (Substitution) If r,Xi:cTi'^^ h C:ct and F h Ci'.CTi for all i G I, then 
r h C[xi^Ci\.CT. 

4. (Subject Reduction) If C C" and r h C-.CT then F h C':ct. 

Assumption I avoids ambiguous expressions that may have a different se- 
mantics at the core and the module level. For instance, we cannot use the usual 
arithmetic symbol -I- at the core level as long as this symbol is used for module 
sum since expressions like x + y result to be ambiguous. For the same reason, 
we do not want all terms obtained as C[xii^Mi\ (with i ^ C ^ Var and 
Xi G FV (C) for all i G I) io belong to the set Eq of core expressions. The other 
direction of implication could be removed since closure of the core language w.r.t. 
substitution can be considered a standard (and therefore implicit) assumption. 

The following lemmas hold for any core calculus verifying the four assump- 
tions above. For reason of space we have omitted all proofs (see [6]). 

Lemma 1 (Coherence). // C — > C" then C C . Furthermore, if F \- C:t 
c 

then F' h C:t for a certain core context F' G_ F . 

The coherence lemma states that the module calculus is a conservative ex- 
tension of the core calculus. In particular, coherence of the type system (second 
part) is needed for proving subject reduction; we conjecture that coherence of the 
reduction semantics (first part) is needed for proving soundness. The remaining 
two lemmas express standard properties. 

Lemma 2 (Weakening). If F \- E:t, then F' h E:t for any well-formed con- 
text F' s.t. F C F' . 



Lemma 3 (Substitution). If FjXf.rE^’ h E:t and F h Ei'.Ti for all i G I, 
then F h E[xi'>^ Eif.r . 



Theorem 1 (Subject Reduction). For any pair of terms E, E' , if E ^ E' 
and F h E-.t, then F h E':t. 



3 Expressive Power of the Calculus 

In this section we show how various composition operators can be encoded in the 
calculus; we analyze in particular the link operator shown in the Introduction 
(3.1), parameterized modules (3.2) and object-oriented features (3.3). 



3.1 Linking Modules 

The link operator informally described in the Introduction has the following 
typing rule 
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rhEi:[r\ri,rJ; rf], E2-\s\s!,,Eh r|] 

hE{, Ef,Ei, Ei, hr? , ai: TJ ^ ^2° 

^ ™ ^ r h hnk^,,„^{Ei, E2):[rs El E^ , E^] era: E^ ^ E? 

(where denotes the input components of Ei which have to be bound to some 

output components of E 2 as indicated by cti, and conversely), and can be easily 
defined in terms of the basic operators as follows 

linka^^a^{Ei, E 2 ) = freeze ^^{freeze^^{Ei + E 2 )). 

Alternatively, cri and U 2 could be implicitly specified by equality of com- 
ponent names, as we have assumed for Zmfc(B00L, INT) in the Introduction, i.e. 
defining link{Ei, E 2 ) = linku^^i^{Ei, E 2 ) with Li, i = 1,2, the obvious inclusions. 

Note that this operator returns a concrete module only if each input compo- 
nent of Ml is mapped into an output component of M 2 and conversely. 

Even if the link operator looks very natural as way of assembling modules, 
there are few concrete examples of module languages which support this opera- 
tor, allowing in practice mutually recursive definitions of modules. The proposal 
which more directly uses an analogous operator is that of units [12]. Basic units 
are very close to basic modules of our calculus, since they are, in their graph- 
ical representation, boxes with an import, an export and an internal section 
(however, differently from our modules units are run-time entities with an ini- 
tialization part). Many units can be composed by a linking process which is 
graphically described by putting all the boxes inside a collecting box and con- 
necting some input to export ports by arrows. This corresponds in our formalism 
to a composition of link operators plus a reduct operation which performs the 
connections from/to ports of the collecting box. Indeed, there is a natural graph- 
ical representation of all our operators over modules, omitted here for reasons 
of space, which very strictly resembles that given in [12] for units; the interested 
reader can refer to [5]. 

Other proposals for recursive modules are those in [1 1] for adding this feature 
to Standard ML^ and the theoretical analysis in [10]; some comparison with them 
is provided in the Conclusion. 



3.2 Parameterized Modules and a Translation for the A-Calculus 

Module systems as those of Standard ML [18] or Objective Caml [17] are based 
on the idea of designing the module language as a small applicative language 
of its own. Hence, modules are of two kinds: constant modules {structures in 
ML terminology), which can be seen in our calculus as basic modules without 
input components, and functions from modules into modules {functors in ML 
terminology), which can be seen in our calculus as basic modules whose input 
components are the expected components of the structure which is the parameter 
of the functor and output components are those defined by the functor itself. 

® The authors use the name mixing for their mutually recursive modules; we prefer to 
reserve this name to modules which support both mutual recursion and overriding 
with dynamic binding as in the object-oriented approach (see 3.3). 
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In these module systems, the only significant operation for composing mod- 
ules is function application, which can be encoded following the schema illus- 
trated in Fig. 4 , where we show a translation of the A-calculus into the module 
calculus (both in the untyped version). 



(var) = X (lambda) <CAa;.eS> = [x i— > Arg\ Res i— > <Ce;^; ] 

(app) <(eie2)> = (/reeze^^g^^^g(<ei> -I- [; Arg <62>; ])). 7 ?es 

Fig. 4. Translation of the A-calculus into the module calculus. 



Interestingly enough, this translation can be defined by using the instanti- 
ation of the module calculus over the simplest core language we could choose: 
the language of variables (recall that the only syntactic assumption over the 
core language is that it must contain the set Var). This shows that the module 
language is a powerful language of its own, regardless the expressive power of 
the underlying core language. 

We can verify that the a- and / 3 -rules are valid under the translation. 

For the a-rule we have that the term \x.e can be a-converted into Ay.e[a: i— *■ y\ 
(with y ^ AV{\x.e)) and then translated into 

E]_ = [y Arg; Res i— > <^e[x i— > j/]^; ]• 

On the other hand Xx.e can be translated into [x i— > Ary, Res ] and 

then a-converted into 

E2 = [y Arg; Res 1— » <tie^[x 1— > y]; ] (with y ^ AV{<^Xx.e':$>)). 

Now, by induction over the structure of A-terms, trivially AV{e) = AV{<^e^) 
and •^e[x 1— > y];g> = <Ce;^[2; 1-^ y] for any y ^ AV{e), hence Ei = £2- 

For the / 3 -rule we have that the term (Acc.ei 62) can be / 3 -converted into 
ei[x I— *■ 62] (assuming that AV{Xx.ei) n FV{e2) = 0 ) and then translated into 
El = <Cei[a; 1-^ 62]^- Then, trivially by induction over the structure of A-terms, 
El = <Cei:^[a: <Ce2»]. On the other hand {Xx.ei 62) can be translated into 

[freeze^^g^^^g{{[xi-^ Arg; i?es <Cei»; ]-!-[; Arc? <Ce 2 >; ]))).3?es 

and then reduced to E2 = <^ei:A>[x 1— > [; Z 1— > ^62^; x ^ <Ce2^].^]. Now, 
since trivially FV{e2) = FF(<Ce2^) and by hypothesis x ^ FV{e2), we have 
that the term [; Z 1— > <^62/^; x ^ <Ce2^]..^ reduces to <Ce2/^, hence Ei and 
E2 are observationally equivalent. 



3.3 Object-Oriented Features 

In the examples considered until now, modules are essentially of two kinds: 
modules with no input components (concrete modules), which can be effectively 
used, and modules with some input components, which need to be combined 
with other modules before to be used. Indeed the selection rule can be applied 
only to concrete modules. 

The key idea of the object-oriented approach w.r.t. modularity features can 
be considered the possibility it offers to write modules (classes) which combine 
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the two features, i.e. where components (methods) are simultaneously ready to be 
used via selection, i.e. are output components, and can be modified by overriding 
in a way that changes the behavior of the components referring to them, i.e. 
are input components (this is sometimes called the open-closed property of the 
object-oriented approach). Components like these are called virtual. 

In other words, a module with virtual components has, intuitively, two dif- 
ferent semantics: an open semantics as a function, which is needed when the 
module is extended via overriding, and a closed semantics (the fixed point of the 
function), which is needed when the module is used via selection of a component 
(following the idea originally due to [9,19]). 

Virtual components can be encoded in our calculus in an indirect way, by 
defining a (generalized) selection operator which, differently from the basic oper- 
ators of the calculus, takes into account the fact that a component name appears 
both in the input and output assignment. 

Formally, this operation has the following typing rule 



(generalized selection) 



r h E • Xk'.Tk 



k£ J, I Q J 



and can be expressed hy E • X = ( freeze, (E)).X with t the inclusion from 
{r, I z e /} into {Y, I j € J}. 

A simple overriding operator has the following typing rule: 



(overriding) 



rhEi:[rbgj; r°,ri°], r^Er.[E\E^-, r°,vg] 
r El <— i?2:[V‘-, Vj, Vj; 



(where E° denotes the output components in E\ which are overridden by those 
in E 2 ) and can be expressed by Ei ^ E 2 = E^^o E 2 

An extended presentation of how to translate various overriding operators, 
including the super mechanism, in a module language supporting the three basic 
operators of sum, reduct and freeze can be found in [4] . We will say that a module 
language supports mixin modules (or simply mixins) if it provides both mutual 
recursion (operators like sum or link) and overriding with dynamic binding, like 
the calculus defined in this paper. 

Note that, although methods of a parent and an heir class can refer to each 
other, traditional object-oriented languages do not support mixins since an heir 
class cannot be used as a real module in the sense of the two principles mentioned 
in the Introduction, since it relies on a fixed parent class. Extensions of object- 
oriented languages with mixins (also called mixin classes or parametric heir 
classes in this case) are proposed in [7,13]. 

As further illustration of how to encode object-oriented features, we show an 
example of translation from the Abadi and Cardelli’s object calculus [1] into our 
calculus (both in the untyped version) . The example shows in particular how to 
encode the self-reference mechanism (for a more general encoding see [6]). 

Consider the object defined by Cnt = [val = 0, inc = <;(s)s.val := s.val 1]. 
The method val returns the current value of the counter, whereas inc returns 
the counter itself where its value has been incremented by one. 

The encoding of Cnt is given by the term C defined by 
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C = [v Val\ Val i— > 0, Inc i— > (s <— [; Val i— > u + 1; ]); s E\, 

-E = [u Val\ Val 0, Inc i— > (s <— [; Val i— » u + 1; ])]. 

The variable s corresponds to the self object. The term s <— [; Val + 1; ] 
is the translation of s.val := s.val + 1. 

Since val is a virtual component of the object^, Val is both an input and an 
output component, so that every redefinition of val must change the behavior 
of methods depending on it. Indeed, this is the case for the method inc whose 
definition in the encoding depends on the deferred variable v. 

Since C is an open module, because of the input component Val, before 
selecting an output component C must be closed with the freeze operator; in 
other words we have to use the generalized selection defined above. Therefore, 
the method invocation Cnt.inc.val is encoded in the term (C • Inc) • Val, i.e., 
freeze y^i^yg^i{freezey^i^y^i{C). Inc). Val which reduces, as expected, to 1. 

4 Conclusion 

We have presented a simple and powerful calculus for module systems equipped 
with a reduction semantics and a sound type system. Moreover, we have illus- 
trated that it can be actually used as a primitive kernel in which to encode 
various existing mechanisms for combining software components. We have also 
implemented a prototype interpreter for the calculus®. 

An extended version of this paper, including proofs and the definition of a 
subtyping relation, is [6] . We have already discussed relations with some recent 
proposals for advanced module systems in Sect. 3. Some further consideration is 
needed for comparing our calculus with work which more specifically deals with 
the problem of recursive type definitions spanning module boundaries, like the 
type-theoretical analysis in [10] in the context of the phase distinction formalism 
[15], or the ad-hoc proposal for Standard ML in [11]. 

From the point of view of our calculus, adding the possibility of type def- 
initions in modules requires an ad-hoc treatment. The basic problem is that 
mutually recursive definitions of types cannot be left open to redefinition (i.e., 
type components cannot be virtual, following the terminology used in this pa- 
per), since the static correctness of other components may rely on their current 
implementation. Hence, module operators must be refined in order to handle 
type components in a special way: for instance, when summing two modules 
the binding of deferred types of one module with corresponding defined types of 
the other must be always implicitly performed, whereas other components are 
implicitly bound by means of the freeze operator. 

In this paper we have not considered type components since here we are 
mainly interested in defining a set of both powerful and simple primitive module 
operators. We have already developed a categorical approach to the denotational 
semantics of modules dealing with types in [5] and a concrete module language 
built on top of a simple functional language in [2] (Chapter 5). Hence, defining 
an enriched calculus with type components will be the more important and 

^ For sake of simplicity we assume the method inc to be non-virtual. 

® See http://www.disi.unige.it/person/AnconaD/Java/UPCMS.html. 
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immediate subject of further work; we expect that more additional ingredients 
than those required in this paper will be needed at the core level, e.g. a syntax for 
type definitions, or for type constraints (see [3]) if we want to take into account 
a more flexible approach allowing types to be “partially” specified. 

Other interesting research directions are a further study of the properties of 
the calculus, like soundness, and a more accurate comparison with other basic 
calculi as outlined in Sect. 3. 
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Abstract. Two of the distinguishing featnres of the Standard ML mod- 
ules language are Its term dependent type syntax and the use of type gen- 
eratlvity in its static semantics. From a type-theoretic perspective, the 
former suggests that the language involves first-order dependent types, 
while the latter has been regarded as an extra-logical device that bears no 
direct relation to type-theoretic constructs. We reformulate the existing 
semantics of Standard ML modules to reveal a purely second-order type 
theory. In particular, we show that generativlty corresponds precisely to 
existential quantification over types and that the remainder of the mod- 
ules type structure is based exclusively on the second-order notions of 
type parameterisation, universal type quantification and subtyping. Onr 
account is more direct than others and has been shown to scale naturally 
to both higher-order and first-class modules. 



1 Introduction 

Standard ML [14] comprises two programming languages: the Core language ex- 
presses details of algorithms and data structures; the Modules language expresses 
the modular architecture of a software system. In Modules, Core language defini- 
tions of type and term identifiers can be packaged together into possibly nested 
terms called structures. Access to structure components is by the dot notation 
and provides good control of the name space in a large program. 

The use of the dot notation to project types from terms suggests that the 
type structure of Standard ML is based on first-order dependent types. In this in- 
terpretation, proposed in [II] and refined in [3], nested structures are modelled 
as dependent pairs whose types are first-order existentially quantified types. 
Standard ML functors, that define functions mapping structures to structures, 
are modelled using dependent functions whose types are first-order universally 
quantified types. Adopting standard first-order dependent types in a program- 
ming language is problematic as it rules out the consistent extension to first-class 
modules [3] without introducing undecidable type checking. In [4], the authors 
observe that standard dependent types also violate the phase distinction between 

* This research has been partially supported by EPSRC grant GR/K63795 
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Core Types 



u t 



type identifier 
function type 



u — > u 



int 

sp.t 



type projection 



integers 



Signature Bodies B type t = u; B 



transparent type specification 
opaque type specification 
value specification 
structure specification 



type t; B 
val X : u; B 
structure X : S; B 

cb 



empty body 



Signature Expressions S sig B end 



encapsulated body 



Fig. 1. Type Syntax of Mini-SML 



compile-time type checking and run-time evaluation and propose a non-standard 
interpretation of dependent types that preserves the phase distinction and has 
decidable typing, but fails to account for other significant features of Modules, 
namely: type generativity, named structure components and subtyping on struc- 
tures. More recently proposed module calculi [2,5,7,8,9,10] that capture most, 
but not all, of the features of Standard ML, and significantly generalise them, 
also resort to non-standard formulations of dependent types. Though enjoying 
a phase distinction, these calculi have other undesirable properties (undecidable 
subtyping in the presence of first-class modules [2,10]; no principal types in the 
presence of higher-order functors [2,5,7,8,10]). 

In this paper, we take a second look at the type structure of Standard ML 
Modules by studying a representative toy language, Mini-SML. The static se- 
mantics of Mini-SML is based directly on that of Standard ML, but our choice 
of notation reveals an underlying type structure that, despite the term depen- 
dent type syntax, is based entirely on the simpler, second-order notions of type 
parameterisation, universal type quantification and subtyping. What remains to 
be explained is the role of type generativity in the semantics, that lends it a pro- 
cedural, non type-theoretic flavour by requiring a global state of generated types 
to be maintained and updated during type checking. We explain and eliminate 
generativity by presenting an alternative, but equivalent, static semantics based 
on the introduction and elimination of second-order existential types [13], thus 
accounting for all of Mini-SML’s type structure in a purely second-order type 
theory. 

2 Syntax 

Mini-SML includes the essential features of Standard ML Modules but, for pre- 
sentation reasons, is constructed on top of a simple Core language of explicitly 
typed, monomorphic functions. The author’s thesis [15], on which this paper is 
based, presents similar results for a generic Core language that may be instan- 
tiated to Standard ML’s Core (which supports the definition of parameterised 
types, is implicitly typed, and polymorphic). The type and term syntax of Mini- 
SML is defined by the grammar in Figures 1 and 2, where t € Typid, x € Valid, 
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Core Expressions 


e = 


X 


value identifier 




1 


Ax : u.e 


function 




1 


e e' 


application 




1 


i 


integer constant 




1 


sp.x 


value projection 


Structure Paths 


sp ::= 


X 


structure identifier 




1 


sp.x 


structure projection 


Structure Bodies 


b :: = 


type t = u; b 


type definition 




1 


datatype t = uwithx, x';b 


datatype definition 




1 


val X = e; b 


value definition 




1 


structure X = s;b 


structure definition 




1 


local X = s in b 


local structure definition 




1 


functor F (X : S) = s in b 


functor definition 




1 


Sb 


empty body 


Structure Expressions 


s 


sp 


structure path 




1 


strnct b end 


structure body 




1 


E(s) 


functor application 




1 


s : S 


transparent constraint 




1 


s :> S 


opaque constraint 



Fig. 2. Term Syntax of Mini-SML 



X G Strld, and F G Funid range over disjoint sets of type, value, structure and 
functor identifiers. 

A core type u may be used to define a type identifier or to specify the type of 
a Core value. These are just the types of a simple functional language, extended 
with the projection sp.t of a type component from a structure path. A signature 
body B is a sequential specification of a structure’s components. A type com- 
ponent may be specified transparently, by equating it with a type, or opaquely, 
permitting a variety of implementations. Value and structure components are 
specified by their type and signature. The specifications in a body are depen- 
dent in that subsequent specifications may refer to previous ones. A signature 
expression S merely encapsulates a body. A structure matches a signature ex- 
pression if it provides an implementation for all of the specified components, and 
possibly more. 

Core expressions e describe a simple functional language extended with the 
projection of a value identifier from a structure path. A structure path sp is a 
reference to a bound structure identifier or the projection of one of its substruc- 
tures. A structure body b is a sequence of definitions: subsequent definitions in 
the body may refer to previous ones. A type definition abbreviates a type. A 
datatype definition generates a new (recursive) type with value constructor x and 
value destructor x'. Value, structure and local definitions bind term identifiers 
to the values of expressions. A functor definition introduces a named function on 
structures: X is the functor’s formal argument, S specifies the argument’s type, 
and s is the functor’s body that may refer to X. The functor may be applied 
to any argument that matches S. A structure expression s evaluates to a struc- 
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a£ Var {a, /3, 5, 7 , . . .} 

Hof 

M,N,P,Qe VarSet = Fin( Far) 
u £ Type :;= a 

I u —> u' 

I int 

ip £ Real Var ^ Type 



_ def 

S £ Sir = 



[ 5tU 


St £ Typid ^ Type, j 


< 5xU 


5x G Valid ^ Type, 


1 5x 


5x G Strld Str \ 
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= 3P.S 
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sets of type variables 
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function space 
integers 
realisations 



semantic structures 



semantic signatures 
existential structures 
semantic functors 



semantic contexts 



Fig. 3. Semantic Objects of Mini-SML 



ture. It may be a path or an encapsulated structure body, whose type, value and 
structure definitions become the components of the structure. The application 
of a functor evaluates its body with respect to the value of the actual argument, 
generating any new types created by the body. A transparent constraint restricts 
the visibility of the structure’s components to those specified in the signature, 
which the structure must match, but preserves the actual implementations of 
type components with opaque specifications. An opaque constraint is similar, 
but generates new, and thus abstract, types for type components with opaque 
specifications. 

Standard ML only permits functor definitions in the top-level syntax. Mini- 
SML allows local functor definitions in structure bodies, which can now serve as 
the top-level: this generalisation avoids the need for a separate top-level syntax. 



3 Semantic Objects 

Following Standard ML [14], the static semantics of Mini-SML distinguishes 
between the syntactic types of the language and their semantic counterparts 
called semantic objects. Semantic objects play the role of types in the static 
semantics. Figure 3 defines the semantic objects of Mini-SML. We let O range 
over all semantic objects. 

Notation: For sets A and B, Fin(A) denotes the set of finite subsets of A, 
and A^ B denotes the set of finite maps from A to B. Let / and g be finite 
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maps. T>{f) denotes the domain of definition of /. The finite map f + g has 
domain V{f) U T>{g) and values (/ + g){a) if a S T’(g) then g{a) else /(a). 

Type variables a € Var range over semantic types u € Type. The latter are 
the semantic counterparts of syntactic Core types and are used to record the 
denotations of type identifiers and the types of value identifiers. 

A realisation ip G Real maps type variables to semantic types and defines 
a substitution on type variables in the usual way. The operation of applying a 
realisation p to an object O is written p{0). 

Semantic structures S G Str are used as the types of structure identifiers and 
paths. A semantic structure maps type components to the types they denote, 
and value and structure components to the types they inhabit. For clarity, we 
define the extension functions 1 1> m, 5 {t w} + 5, x : m, 5 {x i— > m} + 5, 

and X : 5, 5' {X 1 -^ 5} + 5', and let es denote the empty structure 0. 

Note that A, 3 and V bind finite sets of type variables. 

A semantic signature AP.S is a parameterised type: it describes the family 
of structures p{S), for p a realisation of the parameters in P. 

The existential structure 3P.S, on the other hand, is a quantified type: vari- 
ables in P are existentially quantified in S and thus abstract. 

A semantic functor WP.S —>■ X describes the type of a functor identifier: the 
universally quantified variables in P are bound simultaneously in the functor’s 
domain, S, and its range, X. These variables capture the type components of the 
domain on which the functor behaves polymorphically; their possible occurrence 
in the range caters for the propagation of type identities from the functor’s 
actual argument: functors are polymorphic functions on structures. The range 
A of a functor is an existential structure X = 3Q.S' . Q is the functor’s set of 
generative type variables, as described in the Definition of Standard ML [14]. 
When a functor with this range is applied, the type of the result is a variant 
of S' , obtained by replacing variables in Q with new, generative variables. 

The Definition of Standard ML [14] is decidedly non-committal in its choice 
of binding operators, using the uniform notation of parenthesised variable sets 
to indicate binding in semantic objects. We prefer to differentiate binders with 
the more suggestive notation A, V and 3. 

A context C is a finite map mapping type identifiers to the semantic types they 
denote, and value, structure and functor identifiers to the types they inhabit. 
For clarity, we define the extension functions C,tt> u =^C-|-{t i— > w}, C,x: u 

C -h {x m}, C, X : 5 C -h {x S}, and C, F : C -h {F P}. 

We let V{0) denote the set of variables occurring /ree in O, where the notions 
of free and bound variable are defined as usual. Furthermore, we identify seman- 
tic objects that differ only in a renaming of bound type variables (a-conversion) . 

The operation of applying a realisation to a type (substitution) is extended 
to all semantic objects in the usual way, taking care to avoid the capture of free 
variables by bound variables. 

Definition 1 (Enrichment Relation) Given two structures S and S' , S en- 
riches S' , written S P S' , if and only ifV{S) D V{S'), and 
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Fig. 4. Common Denotation and Classification Judgements 



~ for all t G T>{S'), 5(t) = 5'(t), 

— for all X G T){S'), S(x) = S'(x), and 

- for all X G V{S'), S{X) h 5'(X). 

Enrichment is a pre-order that defines a subtyping relation on semantic struc- 
tures (i.e. 5 is a subtype of S' if and only if 5 ^ S'). 

Definition 2 (Functor Instantiation) A semantic functor \/P.S X in- 
stantiates to a functor instance S' X' , written \/P.S ^ X > S' ^ X' , if and 
only if ip (S) = S' and (p {X) = X' , for some realisation p with V{p) = P . 



Definition 3 (Signature Matching) A semantic structure S matches a sig- 
nature AP .S' if and only if there exists a realisation p with V{p) = P such that 

S > p{S'). 
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C,N\-h:S^ M 



Chul>u {C,tt> u), N h : S ^ M 
C,N \- (type t = u; b) : (t > u, 5) => M 



a^N C,t>ahul>u 

(C, t > a, X : u ^ a,x' : a — > u), N U {a} h b : <S => M 

C, h(datatype t = u with x, x'; b) : (t > a, x: tt — > a, x': a ^ m, <S) =>{a} U M 



C e : u (C, X : u), h b : 5 => M 
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C,N\-s:S^ P {C,X-.S),NUP\-h-S'^Q 
C,N\- (structure X = s;b) ■. {X ■. S ,S') ^ P U Q 
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C\-S\>AP.S PnN = 0 {C,X-.S),NuP\-a:S' ^ Q 
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Fig. 5. Generative Classification Judgements 



4 Static Semantics 

In this section we introduce two distinct static semantics for structure bodies 
and structure expressions. The systems rely on shared judgement forms relating 
Core types, signature bodies and signature expressions to their denotations, and 
Core expressions and structure paths to their types. The common judgements 
are shown in Figure 4. We can factor out these judgements because they do not 
generate any new free variables in their conclusions. Observe that the opaque 
type specifications in a signature expression give rise to the type parameters of 
the semantic signature it denotes (Rule 2). 
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4.1 Generative Semantics 

Figure 5 presents a static semantics for structure bodies and expressions that 
employs generative classification judgements in the style of Standard ML [14]. 

Consider the form of the judgements C,N \- h ■. S ^ M and C,fV h s : 
S ^ M. The set of type variables N is meant to capture a superset of the 
variables generated so far. Classification produces, besides the semantic object 
S, the set of variables M generated during the classification of the phrase b or 
s. The variable sets are threaded through classification trees in a global, state- 
like manner. This avoids any unsafe confusion of existing variables with the 
fresh variables generated by datatype definitions (Rule 3), functor applications 
(Rule 5), and opaque constraints (Rule 6). The generative nature of classification 
is expressed by the following property: 

Property 1 (Generativity) IfC,N\-h/s:S^M then N f\M = %} 

Note that the sets of generated variables are not redundant. Suppose we 
deleted them from the classification judgements and replaced occurrences of N 
by V(C), so that variables are generated to be fresh with respect to just the 
current context instead of the state. 

For a counterexample, consider the following phrase: 

structure X = struct datatype t = int with c, d end; 
structure Y = struct structure X = struct end; 

datatype u = int ^ int with c, d 

end; 

val X = (Y.d (X.c 1)) 2 

This phrase is unsound because the definition of x leads to the sad attempt 
of applying 1 to 2. The phrase should be rejected by a sound static semantics. 

In the putatively simpler, state-less semantics, we only require that the type 
variables chosen for t and u are distinct from the variables free in the context 
of their respective definitions. The annotated phrase shows what can go wrong: 

0 0 

[structure X = struct [datatype to, = int with c, d end; 

{“} {“} 

[structure Y = struct [structure X = struct end; 

0 

[datatype Uq = int — > int with c, d 

end; 

{“} 

[val X — (Y.dc — ,int — ^int(^-Cint — *a. l)a )int ^int ^ 

Assuming an initially empty context, we have annotated the beginning of 
each structure body b with the set N of variables free in the local context, using 

N 

the notation [ b, the defining occurrences of t and u are decorated with their 
^ When P is a predicate, we use the abbreviation P(b/s) to mean P(b) and P(s). 
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denotations, and key subphrases with their types. The problem is that t and u 
are assigned the same type variable a, even though they must be distinguished. 
The problem arises because a, already set aside for t, no longer occurs free in 
the local context at the definition of u: the free occurrence is eclipsed by the 
shadow of the second definition of X. Thus the semantics may again choose a 
to represent u, and incorrectly accept the definition of x. 

The generative semantics that uses a state maintains soundness as follows: 

0 0 {a} 

1 structure X = struct J, datatype t^ = int with c, d| end; 

{«} {“} 

I structure Y = struct | structure X = struct end; 

{«} {/3}{/3} 

J, datatype u,g = int — > int with c, d| | 

end; 

{«./ 3 } 

i Val X — (Y.d^ — >int — ^int(X.Cint — l)a ) 2 

Assuming an initially empty context and state, we have indicated, at the 
beginning of each structure body b, the state N of variables generated so far, 
and, at its end, the variables M generated during its classification. We use the 

N M 

notation | b corresponding to a classification . . N \- h : . . . ^ M . Observe 
that generated variables are accumulated in the state as we traverse the phrase. 
At the definition of u, a is recorded in the state, even though it no longer occurs 
free in the current context, forcing the choice of a distinct variable (3. In turn, 
this leads to the detection of the type violation, which is underlined. 

These observations motivate: 

Definition 4 (Rigidity) C is rigid w.r.t. N, written C,N rigid, if and only if 
V(C) C N. 

As long as we start with C,N rigid, as a consequence of Property 1, those 
variables in M resulting from the classification of b and s will never be confused 
with variables visible in the context, even if these are temporarily hidden by 
bindings added to C during sub-classifications. 

A similar example motivates the generativity of functor application. Consider 
this unsound phrase that applies 1 to 2 in the definition of x: 

functor F(X: sig type t end) = struct datatype u = X.t with c, d end 
in 

structure Y = F(struct type t = int end); 
structure Z = F(struct type t = int — > int end); 
valx= (Z.d (Y.c 1)) 2 

In a naive semantics with applicative (non-generative) functors, we would 
simply add the generative variables returned by a functor’s body to the state 
at the functor’s definition, omitting the generation of fresh variables each time 
it is applied. Then each application of the functor would return equivalent ab- 
stract types. In our example, this means that the types Y.u and Z.u would be 
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CI-s:3P.5 PnV(C) = 0 C,X : 5 h b : 3Q.5' QnP = l 



(8) 
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C\-S>AP.S PnV(C) = 0 C,X:5hs:T 
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C V- F(s) : 3PU Q.5" 

CI-s:3P.5 CI-St>AQ.5' PnV(AQ.5') = 0 


5 h^{S') 
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CI-s:3P.5 CI-St>AQ.5' PnV(AQ.5') = 0 


S h‘p{S') 


V{^) = Q 



C h (s :> S) : 3Q.5' 

Fig. 6. Type-Theoretic Classification Judgements 



identified, allowing the unsound definition of x to be accepted. In the generative 
semantics, each application of F returns new types, so that Y.u and Z.u are 
distinguished and the definition of x is correctly rejected. Observe that the defi- 
nition of u depends on the functor argument’s opaque type component t, whose 
realisation can vary with each application of F. The naive applicative semantics 
for functors is unsound because it does not take account of this dependency; 
the generative semantics does. (Note that the less naive semantics of applicative 
functors given in [8] is sound because the abstract types returned by a functor 
application are expressed as a function of the functor’s actual term argument.) 

4.2 Type-Theoretic Semantics 

To a type theorist, the generative judgements appear odd. The intrusion of the 
state imposes a procedural ordering on the premises of the generative rules that 
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is in contrast with the declarative, compositional formulation of typing rules in 
Type Theory. The fact that the type of the term may contain “new” free type 
variables, that do not occur free in the context, is peculiar (conventional type 
theories enjoy the free variable property: the type of a term is closed with re- 
spect to the variables occurring free in the context). Perhaps for this reason, 
generativity has developed its own mystique and its own terminology. In Stan- 
dard ML [14], type variables are called “type names” to stress their persistent, 
generative nature. Generativity is often presented as an extra-logical device, use- 
ful for programming language type systems, but distinct from more traditional 
type-theoretic constructs. In this section, we show how to replace the generative 
judgements by more declarative, type-theoretic ones. 

Figure 6 presents an alternative static semantics for structure bodies and 
expressions, defined by the judgements C h b : Af and C \- s : X. Rather than 
maintaining a global state of variables threaded through classifications, we clas- 
sify structure bodies and expressions using existential structures. 

The key idea is to replace global generativity with the introduction and elim- 
ination of existential types — in essence: local generativity. In the rules, the side 
conditions on bound variables prevent capture of free variables in the usual way. 
Because they are bound, the variables can always be renamed to satisfy the side 
conditions. For intuition, we explain some of the rules: 

(datatype t = u with x,x';b): The denotation of u is determined in the 
context extended with the recursive assumption that t denotes a, where a is a 
hypothetical type represented by a variable that is fresh for C. This determines 
the types of the constructor x and the destructor x' that are added to the con- 
text before classifying the body b. Provided b has existential structure 3P.S, 
which may contain occurrences of a, we conceptually eliminate the existential 
quantification over S, introducing the hypothetical types P, extend the record 
of components t, x and x' by S and then existentially quantify over both the 
hypothetical type a and the hypothetical types P we just introduced. 

(structure X = s;b): Provided s has existential structure 3P.S, we elimi- 
nate the existential, introducing the hypothetical types P, and classify b in the 
context extended with the assumption X : 5 to obtain the existential structure 
3Q.S' of b. Now 3Q.S' may contain some of the hypothetical types in P that 
should not escape their scope. We eliminate this existential, extend the compo- 
nent X : 5 by 5' and existentially quantify over the hypothetical types P U Q. 

(functor F (X : S) = s in b): The signature expression S denotes a family of 
semantic structures, AP.S. For every tp with T>{p) = P, F should be applicable 
at any enrichment, i.e. subtype, of p {S). To this end, we classify the body s of 
F in the context extended with the assumption X : S. By requiring that P is a 
locally fresh choice of type variables, we ensure that 5 is a generic example of a 
structure matching AP.S, and that variables in P act as formal type parameters 
during the classification of the body. Classifying s yields an existential structure 
X that may contain occurrences of the parameters P. If this succeeds for a 
generic choice of parameters, it will also succeed for any realisation of these 
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parameters^. We discharge the type parameters by universal quantification over 
P and add the assumption that F has the polymorphic type VP.S ^ X to the 
context. The scope b of the functor definition determines the type X' of the 
entire phrase. 

(F(s)): Provided s has existential structure 3P.S, we locally eliminate the 
quantifier and choose an appropriate instance S' 3Q.S" of the functor’s type. 
This step corresponds to eliminating the functor’s polymorphism by choosing a 
realisation ip of its type parameters. The functor may applied if the actual argu- 
ment’s type S enriches the instance’s domain S', i.e. provided 5 is a subtype of 
S' . The range 3Q.S" of the instance determines the type of the application, and 
may propagate some of the hypothetical types in P via the implicit realisation 
p. To prevent these from escaping their scope, we abstract them by extending 
the existential quantification over S" to cover both P and Q. 

(s : S): Provided s has existential type 3P.S and S denotes a semantic sig- 
nature, we first eliminate the existential quantification and then check that S 
matches the denotation of S. The denotation AQ.S' describes a family of seman- 
tic structures and the requirement is that the type S of the structure expression 
is a subtype of some member p (S') of this family. Since p is applied to S' 
in the conclusion 3P.p{S'), the actual denotations of type components that 
have opaque specifications in S are preserved: however, the visibility of some 
components of s may be curtailed. The realised structure p {S') may mention 
hypothetical types in P. Existentially quantifying over P prevents them from 
escaping their scope. 

(s :> S): We proceed as in the previous case, but the type of s :> S is 
3Q.S', not 3P.p{S'). Introducing the existential quantification over Q hides 
the realisation, rendering type components specified opaquely in S abstract. 

Before we can state our main result we shall need one last concept: 

Definition 5 (Ground Functors and Contexts) A semantic functor T = 
yP.S X is ground, written h iF Gnd if and only if P Q V)^). A context C is 
ground, written h C Gnd, precisely when all the semantic functors in its range 
are ground. 

The ground property of a semantic functor T ensures that whenever we apply 
a functor of this type, the free variables of the range are either propagated from 
the actual argument, or were already free in T . With this observation one can 
prove the following free variable lemma: 

Lemma 1 (Free Vars) If\~C Gnd then C h b/s : T implies V(X) C V(C). 

Note that the ground property of contexts is preserved as an invariant of the 
classification rules. We only need to impose it when reasoning about classifica- 
tions derived with respect to an arbitrary context, which might be non-ground. 

^ (it can be shown that derivations are closed under realisation, hence for any p with 
domain P, because C, X : 5 h s : T we also know that p((C,X :5)) h s : p (X) and 
this is equivalent to C,X : p (5) h s : (X), since P n V(C) = 0 ) 
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We can revisit the example of Section 4.1 to demonstrate how our alternative 
semantics maintains soundness, without relying on a global state of generated 
type variables: 

3{a }.(t>a ,c:int — hx ,d:o: — >^int) 

structure [X = [struct datatype t^ = int with c, d end; 

(t>a ,c:int — ,d:a — »^int) 

}.(X :65 ,u>a ,c:(int — »^int) — ,d:a — »^int — >int) 

structure [Y = [struct structure X = struct end; 

(X :65 ,u[>/3,c:(int — >^int) — >f3 ,d:/3 — >^int — '^int) 

datatype Uq = int — > int with c, d 

end; 

Val X = (Y.d^_,int— >int(^-Cint— »a l)a ) 2 

Assuming the initial context is empty, we have indicated the existential types 

A' 

of the defining structure expressions using the notation [s, and the types of the 
identifiers X and Y in the context using the notation [X. We have also indicated 

the type variables chosen to represent t and u at their point of definition. 

The existential type of the structure expression defining X is: 

3{a}.(t > a, c : int ^ a, d : a — *■ int). 

Since a is fresh for the empty context, we can eliminate this existential quantifier 
directly so that, after the definition of X, the context of Y contains a free 
occurrence of a. As in the unsound state- less semantics discussed in Section 4.1, 
we are free to re-use a to represent u at the definition of u, because a no longer 
occurs in the context after the second definition of X. However, inspecting the 
existential type, 

A = 3{a}.(X:e5,u >a,c : (int — > int) ^ a,d : a ^ int — > int), 

of the structure expression defining Y, we can see that this variable is dis- 
tinguished from the free occurrence of a in the context by the fact that it is 
existentially bound. Before we can extend the context with the type of Y, we 
need to eliminate this existential quantifier. The first side-condition of Rule 8 
requires that we avoid capturing the free occurrence of a in the context of Y. 
To do this, it is necessary to choose a renaming of A, in this case 

3{/3}.(X : £ 5 , u > /3, c : (int — > int) ^ /3, d : /3 ^ int int), 

for a variable j3 that is locally fresh for the context of Y, and, in particular, dis- 
tinct from a . After eliminating the renamed quantifier and extending the context 
with the type of Y, the abstract types X.t and Y.u are correctly distinguished 
by a and /3, catching the underlined type violation in the definition of x. 

5 Main Result 



Having defined our systems, we can now state the main result of the paper: 
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Theorem 1 (Main Result) Provided h C Gnd and C, N rigid; 

Completeness If C,N \- h/s : S ^ M then C h b/s : 3M .S. 

Soundness If C \- h/s : X then there exist S and M such that C,N h b/s : 
iS M with X = 3M .S. 



An operational view of the systems in Figures 5 and 6 is that we have re- 
placed the notion of global generativity by local generativity and the ability 
to rename bound variables when necessary. The proof of completeness is easy 
because a variable that is globally fresh will certainly be locally fresh, enabling 
a straightforward construction of a corresponding state-less derivation. 



Proof ( Completeness). By strong induction on the generative classification rules. 
We only describe the case for structure definitions, the others are similar: 



Rule 4 Assume C, N \- s : S P (i), (C, X : 5), A U P h b : 5' Q (ii) and 

induction hypotheses h C Gnd D C,N rigid D C h s : 3P.S (iii) and 

hC,X:SGndD{C,X:S),NUPrigidDC,X:Shh:3Q.S' (iv). Suppose 
h C Gnd (v) and C,N rigid (vi). Now by induction hypothesis (iii) on (v) 
and (vi) we obtain C h s : 3P.S (vii). Property 1 of (i), together with (vi), 

ensures that P n V(C) = 0 (viii). Clearly (v) extends to h C,X : 5 Gnd (ix). 

Lemma 1 on (v) and (vii) ensures V(3P.5) C V(C). It follows from (vi) that 
V(5) C A U P (x) and consequently (C,X : S),N U P rigid (xi). Applying in- 
duction hypothesis (iv) to (ix) and (xi) yields C,X : 5 h b : 3(5.5' (xii). Prop- 
erty 1 of (ii) ensures (5 H (A U P) = 0 which, together with (x), entails 
(5 n (P U V(5)) = 0 (xiii). Rule 8 on (vii), (viii), (xii) and (xiii) derives the 
desired result C h (structure X = s;b) : 3P U (5.(X : S,S'). 



In the complete proof. Property 1 and Lemma 1 conspire to ensure the side 
conditions on existentially bound variables and hence that implicit renamings of 
these variables are never required. 



5.1 Soundness 

Soundness is more difficult to prove, because the state-less rules in Figure 6 only 
requires subderivations to hold for particular choices of locally fresh variables. A 
variable may be locally fresh without being globally fresh, foiling naive attempts 
to construct a generative derivation from a state-less derivation. 

To address this problem, we introduce a modified formulation of the state- 
less classification judgements with the judgement forms C h' h : X and C h' s : X 
that have similar rules but with stronger premises. Instead of requiring premises 
to hold for particular choices of fresh variables, the modified rules require them 
to hold for every choice of variables. To express these rules, we define the concept 
of a renaming tt G Var ^ Type that is similar to a realisation, but simply maps 
type variables to type variables. The operation of applying a renaming to a 
semantic object O, written is extended to all semantics objects in a way 

that avoids the capture of free variables by bound variables. For instance, the 
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modified version of Rule 8 receives a stronger second premise that subsumes the 
second and third premises of the original rule: 

C h' s : 3P.S V7T.I?(7r) = PdC,X: tt{S) h' b : tt{3Q.S') Q D {P U V(5)) = 0 
C \-' (structure X = s;b) : 3P U Q.{X : S,S') 

Similar changes are required to Rules 7, 9 and 10 that extend the context with ob- 
jects containing locally fresh variables. The generalised premises make it easy to 
construct a generative derivation from the derivation of a generalised judgement. 
Although these rules are not finitely branching, the judgements are well-founded 
and amenable to inductive arguments. This technique is adapted from [12]. 

Our proof strategy is to first show that any derivation of a state-less judge- 
ment gives rise to a corresponding derivation of a generalised judgement: 

Lemma 2 If \- C Gnd and C h b/s : A then C h' b/s : A. 

We then show that any derivation of a generalised judgement gives rise to a 
corresponding generative derivation: 

Lemma 3 If \~ C Gnd and C h' b/s : A then, for any N satisfying C, N rigid, 
there exist S and M such that C, N \- h/s : S ^ M , with A = 3M .S. 

The proofs require stronger induction hypotheses and are technically in- 
volved. Further details can be found in the author’s thesis [15]. 

6 Contribution 

Theorem 1 is an equivalence result, but we propose that the state-less semantics 
provides a better conceptual understanding of the type structure of Standard ML. 

The core type phrase sp.t, which introduces a dependency of Mini-SML’s type 
syntax on its term syntax, suggests that Mini-SML’s type structure is based on 
first-order dependent types. However, arguing from our semantics, we can show 
that first-order dependent types play no role in the semantics. 

Compare the syntactic types of Mini-SML with their semantics counterparts, 
the semantic objects that are used to classify Mini-SML terms. Where type 
phrases allow occurrences of type identifiers and term dependent projections 
sp.t, semantic types instead allow occurrences of type variables a € Var. Type 
variables range over semantic types and are thus second-order variables. While 
the component specifications of a signature expression are dependent, in that 
subsequent specifications can refer to term identifiers specified previously in the 
body, the body of a semantic signature is just an unordered finite map, with 
no dependency between its components: the identifiers in a semantic structure, 
like the field names of record types, do not have scope. Thus there is a clear 
distinction between syntactic types and semantics objects: where syntactic types 
have first-order dependencies on term identifiers, semantic types have second- 
order dependencies on type variables. 
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The reduction of first-order to second-order dependencies is achieved by Mini- 
SML’s denotation judgements. In particular, the denotation of the term depen- 
dent type sp.t is determined by the type, not the value, of the term sp. In 
conjunction with Rule 2, that assigns type variables to opaque type specifica- 
tions, Rule 1 reduces the first-order dependencies of syntactic types on terms to 
second-order dependencies of semantic types on type variables. 

We can illustrate this reduction by comparing the following signature expres- 
sion with its denotation: 

sig structure X : sig type 
end; 

structure Y : sig type 
type 
end; 

val y: X.t — > Y.v 
end 



u; 

V = X.t 



T{a, /?}. (X : (t > a), 

Y : (u>/3, 

V > a ^ /3), 

y : a ^ P) 



The opaque types t and u are represented by type variables a and /3; the depen- 
dency on the terms X and Y in the specifications of v and y have disappeared. 

As another example, let S be the above signature expression and consider 
the following functor and its type: 

V{a,/3}.(X 



functor F(Z : S) = 

struct type w = Z.X.t 
val z = Z.y 

end 



Z.Y.^ 



Y 



P), 

f3) 



(t >a), 

(u>/3, 

V > a - 

: a ^ a - 

30. (w [> a ^ a — *■ /3, 
z : a ^ a — > /3) 

F returns the type w whose definition depends on the term argument Z. In the 
semantic object, this first-order dependency has been eliminated, in favour of a 
second-order dependency on the functor’s type parameters a and p. 

Our choice of binding notation and the reformulation of the generative classi- 
fication rules further underline the fact that Mini-SML, and thus Standard ML, is 
based on a purely second-order type theory. In this interpretation, signatures are 
types parameterised on type variables, functor are polymorphic functions whose 
types have universally quantified type variables, and structure expressions have 
types with existentially quantified type variables. A universal quantifier is ex- 
plicitly introduced when a functor is defined and silently eliminated when it is 
applied. An existential quantifier is explicitly introduced by a datatype defini- 
tion or an opaque signature constraint, and silently eliminated and re-introduced 
at other points in the semantics. (The limited computation on modules means 
that, unlike the first-class existential types of [13], the witness of an existential 
type can depend at most on the static interpretation of type variables in the 
context, but never on the dynamic interpretation of term identifiers.) Allowing 
a functor’s actual argument and a constraint’s structure expression to have a 
richer type is an appeal to subtyping that can easily be factored into a separate 
subsumption rule as in traditional formalisations of subtyping in Type Theory. 
We have not done this to keep the classification rules syntax directed: this avoids 
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admitting non-principal classifications and simplifies the statement and proof of 
soundness. 

The style of semantics presented here scales naturally to both higher-order 
and first-class modules [15,16]. Both extensions are compatible with ML-style 
type inference for the Core. The higher-order extension is competive with, though 
subtly different from, the calculi of [2,7,8,10]. Where these caculi have the ad- 
vantage of syntactic types, ours has the advantage of enjoying principal types. 
In [2,7,8,10], the application of a functor to an anonymous argument may not 
have a syntactic type unless one can promote the functor’s range to a supertype 
that does not propagate any opaque types of the actual argument. This proviso 
leads to a loss of principal types, since there may be two or more unrelated 
types to which one may promote the range when it is a functor signature: one 
can narrow its domain or promote its range (Section 9.2.2 of [15]). Because the 
style of semantics presented here can represent anonymous opaque types using 
existential type variables, there is no need to promote the functor’s range in an 
arbitrary manner, preserving principality. The extension to first-class modules, 
which requires just three new Core constructs to specify, introduce and elimi- 
nate first-class module types, has a decidable type checking problem. It avoids 
the undecidability of subtyping in the first-class module calculi of [2,10] by pre- 
serving the distinction between Modules and Core level types and disallowing 
subtyping between Core types that encapsulate modules. 

Independent of this work: [6] uses type parameterisation and quantification 
to model a fragment of Standard ML Modules; [1] presents a declarative type 
system similar to ours to simplify the proof of correctness of a novel compila- 
tion scheme for Modules; [17] combines type parameterisation and quantification 
with non-standard first-order dependent types in an explicitly typed modules 
language, promising the advantages of both. 
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Abstract. OPL is a modeling language for mathematical programming 
and combinatorial optimization problems. It is the first modeling lan- 
guage to combine high-level algebraic and set notations from model- 
ing languages with a rich constraint language and the ability to specify 
search procedures and strategies that is the essence of constraint pro- 
gramming. In addition, OPL models can be controlled and composed 
using OPLScript, a script language that simplifies the development of ap- 
plications that solve sequences of models, several instances of the same 
model, or a combination of both as in column-generation applications. 
This paper illustrates some of the functionalities of OPL for constraint 
programming using frequency allocation, sport-scheduling, and job-shop 
scheduling applications. It also illustrates how OPL models can be com- 
posed using OPLScript on a simple configuration example. 



1 Introduction 

Combinatorial optimization problems are ubiquitous in many practical appli- 
cations, including scheduling, resource allocation, planning, and configuration 
problems. These problems are computationally difficult (i.e., they are NP-hard) 
and require considerable expertise in optimization, software engineering, and the 
application domain. 

The last two decades have witnessed substantial development in tools to sim- 
plify the design and implementation of combinatorial optimization problems. 
Their goal is to decrease development time substantially while preserving most 
of the efficiency of specialized programs. Most tools can be classified in two 
categories: mathematical modeling languages and constraint programming lan- 
guages. Mathematical modeling languages such as AMPL [4] and GAMS [1] 
provides very high-level algebraic and set notations to express concisely math- 
ematical problems that can then be solved using state-of-the-art solvers. These 
modeling languages do not require specific programming skills and can be used 
by a wide audience. Constraint programming languages such as CHIP [3], PRO- 
LOG III and its successors [2], OZ [12], and Ilog Solver [11] have orthogonal 
strenghts. Their constraint languages, and their underlying solvers, go beyond 
traditional linear and nonlinear constraints and support logical, high-order, and 
global constraints. They also make it possible to program search procedures to 
specify how to explore the search space. However, these languages are mostly 
aimed at computer scientists and often have weaker abstractions for algebraic 
and set manipulation. 

G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 98-116, 1999. 

© Springer- Verlag Berlin Heidelberg 1999 




Constraint Programming in OPL 



99 



The work described in this paper originated as an attempt to unify mod- 
eling and constraint programming languages and their underlying implementa- 
tion technologies. It led to the development of the optimization programming 
language OPL [13], its associated script language OPLScript [14], and its develop- 
ment environment OPL Studio. 

OPL is a modeling language sharing high-level algebraic and set notations 
with traditional modeling languages. It also contains some novel functionalities 
to exploit sparsity in large-scale applications, such as the ability to index arrays 
with arbitrary data structures. OPL shares with constraint programming lan- 
guages their rich constraint languages, their support for scheduling and resource 
allocation problems, and the ability to specify search procedures and strategies. 
OPL also makes it easy to combine different solver technologies for the same 
application. 

OPLScript is a script language for composing and controlling OPL models. 
Its motivation comes from the many applications that require solving several 
instances of the same problem (e.g., sensibility analysis), sequences of models, or 
a combination of both as in column-generation applications. OPLScript supports 
a variety of abstractions to simplify these applications, such as OPL models as 
first-class objects, extensible data structures, and linear programming bases to 
name only a few. 

OPL Studio is the development environment of OPL and OPLScript. Beyond sup- 
port for the traditional ’’edit, execute, and debug” cycle, it provides automatic 
visualizations of the results (e.g., Gantt charts for scheduling applications), vi- 
sual tools for debugging and monitoring OPL models (e.g., visualizations of the 
search space), and C++ code generation to integrate an OPL model in a larger 
application. The code generation produces a class for each model objects and 
makes it possible to add/remove constraints dynamically and to overwrite the 
search procedure. 

The purpose of this paper is to illustrate some of the constraint programming 
features of OPL through a number of models. Section 2 describes a model for a 
frequency allocation application that illustrates how to use high-level algebraic 
and set manipulation, how to exploit sparsity, and how to implement search 
procedures in OPL. Section 3 describes a model for a sport-scheduling applica- 
tions that illustrates the use of global constraints in OPL. Section 4 describes 
an application that illustrates the support for scheduling applications and for 
search strategies in OPL. Section 5 shows how OPL models can be combined using 
OPLScript on a configuration application. All these applications can be run on 
ILOG OPL Studio 2.1. 

2 Frequency Allocation 

The frequency-allocation problem [11] illustrates a number of interesting features 
of OPL: the use of complex quantifiers, and the use of a multi-criterion ordering 
to choose which variable to assign next. It also features an interesting data 
representation that is useful in large-scale linear models. 




100 P. Van Hentenryck et al. 



The frequency-allocation problem consists of allocating frequencies to a num- 
ber of transmitters so that there is no interference between transmitters and the 
number of allocated frequencies is minimized. The problem described here is an 
actual cellular phone problem where the network is divided into cells, each cell 
containing a number of transmitters whose locations are specified. The interfer- 
ence constraints are specified as follows: 

— The distance between two transmitter frequencies within a cell must not be 
smaller than 16. 

— The distances between two transmitter frequencies from different cells vary 
according to their geographical situation and are described in a matrix. 

The problem of course consists of assigning frequencies to transmitters to avoid 
interference and, if possible, to minimize the number of frequencies. The rest of 
this section focuses on finding a solution using a heuristic to reduce the number 
of allocated frequencies. 



int nbCells = . . . ; 

int nbFreqs = . . . ; 

ramge Cells l..nbCells; 

range Freqs 1.. nbFreqs; 

int nbTrans [Cells] = . . . ; 

int distance [Cells, Cells] = ...; 

struct TransmitterType { Cells c; int t; }; 

{TransmitterType} Transmits = { <c,t> I c in Cells & t in 1 . .nbTrans [c] }; 
var Freqs f req[Transmits] ; 

solve { 

foralKc in Cells & ordered tl, t2 in 1 .. nbTrans [c] ) 
abs(freq[<c,tl>] - freq[<c,t2>] ) >= 16; 

f oralKordered cl, c2 in Cells : distance [cl , c2] > 0) 
foralKtl in 1 . .nbTrans [cl] & t2 in 1 .. nbTrans [c2] ) 

abs(freq[<cl,tl>] - f req[<c2 ,t2>] ) >= distance [cl , c2] ; 

}; 

search { 

foralKt in Transmits 

ordered by increasing <dsize(freq[t] ), nbTrans [t . c] >) 
tryalKf in Freqs ordered by decreasing nbOccur (f ,freq) ) 
freq[t] = f; 

}; 



Fig. 1. The Frequency-Allocation Problem (alloc. mod). 
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Figure 1 shows an OPL statement for the frequency-allocation problem and 
Figure 2 describes the instance data. Note the separation between models and 
data which is an interesting feature of OPL. The model data first specifies the 
number of cells (25 in the instance), the number of available frequencies (256 
in the instance), and their associated ranges. The next declarations specify the 
number of transmitters needed for each cell and the distance between cells. For 
example, in the instance, cell 1 requires eight transmitters while cell 3 requires 
six transmitters. The distance between cell 1 and cell 2 is 1. 

The first interesting feature of the model is how variables are declared: 

struct TrausmitterType { Cells c; int t; }; 

{TransmitterType} Transmits = { <c,t> I c in Cells & t in 1 . .nbTrans [c] }; 
var Freqs f req[Transmits] ; 

As is clear from the problem statement, transmitters are contained within cells. 
The above declarations preserve this structure, which will be useful when stating 
constraints. A transmitter is simply described as a record containing a cell num- 
ber and a transmitter number inside the cell. The set of transmitters is computed 
automatically from the data using 

{TransmitterType} Transmits = { <c,t> I c in Cells & t in 1 . .nbTrans [c] }; 

which considers each cell and each transmitter in the cell. OPL supports a rich 
language to compute with sets of data structures and this instruction illustrates 
some of this functionality. The model then declares an array of variables 

var Freqs f req[Transmits] ; 

indexed by the set of transmitters; the values of these variables are of course 
the frequencies associated with the transmitters. This declaration illlustrates a 
fundamental aspect of OPL: arrays can be indexed by arbitrary data. In this appli- 
cation, the arrays of variables freq is indexed by the elements of transmitters 
that are records. This functionality is of primary importance to exploit spar- 
sity in large-scale models and to simplify the statement of many combinatorial 
optimization problems. 

There are two main groups of constraints in this model. The first set of 
constraints handles the distance constraints between transmitters inside a cell. 
The instruction 

foralKc in Cells & ordered tl, t2 in 1 . .nbTrans [c] ) 
abs(freq[<c,tl>] - freq[<c,t2>] ) >= 16; 

enforces the constraint that the distance between two transmitters inside a cell 
is at least 16. The instruction is compact mainly because we can quantify sev- 
eral variables in forall statements and because of the keyword ordered that 
makes sure that the statement considers triples <c,tl,t2> where tl < t2. Of 
particular interest are the expressions freq[<c,tl>] and freq[<c,t2>] illus- 
trating that the indices of array freq are records of the form <c,t>, where c is 
a cell and t is a transmitter. Note also that the distance is computed using the 
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function abs, which computes the absolute value of its argument (which may be 
an arbitrary integer expression). 

The second set of constraints handles the distance constraints between trans- 
mitters from different cells. The instruction 

f oralKordered cl, c2 in Cells : distance [cl , c2] > 0) 
foralKtl in 1 . .nbXrans [cl] & t2 in 1 . .nbXrans [c2] ) 

abs(freq[<cl,tl>] - freq[<c2,t2>] ) >= distamce [cl , c2] ; 

considers each pair of distinct cells whose distance must be greater than zero 
and each two transmitters in these cells, and states that the distance between 
the frequencies of these transmitters must be at least the distance specified in 
the matrix distance. 

Another interesting part of this model is the search strategy. The basic struc- 
ture is not surprising: OPL considers each transmitter and chooses a frequency 
nondeterministically. The interesting feature of the model is the heuristic. OPL 
chooses to generate a value for the transmitter with the smallest domain and, in 
case of ties, for the transmitter whose cell size is as small as possible. This multi- 
criterion heuristic is expressed using a tuple <dsize (freq[t] ) , nbXrans [t . c] > 
to obtain 

foralKt in Xransmits ordered by increasing <dsize(freq[t] ), nbXrans [t . c] >) 

Each transmitter is associated with a tuple < s, c >, where s is the number 
of possible frequencies and c is the number of transmitters in the cell to which 
the transmitter belongs. A transmitter with tuple < si, ci > is preferred over a 
transmitter with tuple < S 2 ,C 2 > if si < S 2 or if si = S 2 and ci < C 2 . 

Once a transmitter has been selected, OPL generates a frequency for it in 
a nondeterministic manner. Once again, the model specifies a heuristic for the 
ordering in which the frequencies must be tried. To reduce the number of fre- 
quencies, the model says to try first those values that were used most often 
in previous assignments. This heuristic is implemented using a nondetermin- 
istic tryall instruction with the order specified using the nbOccur function 
(nbOccur(i,a) denotes the number of occurrences of i in array a at a given 
step of the execution): 

foralKt in Xransmits ordered by increasing <dsize(freq[t] ), nbXrans [t . c] >) 
tryall (f in Freqs ordered by decreasing nbOccur (f ,freq) ) 
freq[t] = f; 

This search procedure is typical of many constraint satisfaction problems and 
consists of using a first heuristic to dynamically choose which variable to instan- 
tiate next (variable choice) and a second heuristic to choose which value to assign 
nondeterministically to the selected variable (value choice). The forall instruc- 
tion is of course deterministic, while the tryall instruction is nondeterministic: 
potentially all possible values are chosen for the selected variable. Note that, on 
the instance depicted in Figure 2, OPL returns a solution with 95 frequencies in 
about 3 seconds. 
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3 Sport Scheduling 

This section considers the sport-scheduling problem described in [7,10]. The 
problem consists of scheduling games between n teams over n 1 weeks. In 
addition, each week is divided into n/2 periods. The goal is to schedule a game 
for each period of every week so that the following constraints are satisfied: 

1. Every team plays against every other team; 

2. A team plays exactly once a week; 

3. A team plays at most twice in the same period over the course of the season. 

A solution to this problem for 8 teams is shown in Figure 3. In fact, the problem 
can be made more uniform by adding a ” dummy” final week and requesting that 
all teams play exactly twice in each period. The rest of this section considers 
this equivalent problem for simplicity. 
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Fig. 3. A Solution to the Sport-Scheduling Application with 8 Teams 



The sport-scheduling problem is an interesting application for constraint pro- 
gramming. On the one hand, it is a standard benchmark (submitted by Bob 
Daniel) to the well-known MIP library and it is claimed in [7] that state-of- 
the-art MIP solvers cannot find a solution for 14 teams. The OPL models pre- 
sented in this section are computationally much more efficient. On the other 
hand, the sport-scheduling application demonstrates fundamental features of 
constraint programming including global and symbolic constraints. In particu- 
lar, the model makes heavy use of arc-consistency [6], a fundamental constraint 
satisfaction techniques from artificial intelligence. 

The rest of this section is organized as follows. Section 3.1 presents an OPL 
model that solves the 14-teams problem in about 44 seconds. Section 3.2 show 
how to specialize it further to find a solution for 14 to 30 teams quickly. Both 
models are based on the constraint programs presented in [10]. 

3.1 A Simple OPL Model 

The simple model is depicted in Figure 4. Its input is the number of teams 
nbTeams. Several ranges are defined from the input: the teams Teams, the weeks 
Weeks, and the extended weeks EWeeks, i.e., the weeks plus the dummy week. 
The model also declares an enumerated type slot to specify the team position 
in a game (home or away). The declarations 
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int nbTeams = . . . ; 
range Teams 1.. nbTeams; 
ramge Weeks 1 . .nbTeams-1 ; 
rEoige EWeeks 1.. nbTeams; 
range Periods 1 . .nbTeams/2; 
rEmge Games 1 . .nbTeams*nbTeams ; 
enum Slots = { home, away }; 

int occur [t in Teams] = 2; 
int values [t in Teams] = t ; 

var Teams team[Periods, EWeeks, Slots] ; 
var Games game [Periods , Weeks] ; 

struct Play { int f; int s; int g; }; 

{Play} Plays = { <i, j , (i-l)*nbTeams+j> I ordered i, j in Teams }; 
predicate link(int f,int s,int g) in Plays; 

solve { 

foralKw in EWeeks) 

alldif f erent ( all(p in Periods & s in Slots) team[p,w,s]) onDomain; 
alldiff erent (game) onDomain; 
foralKp in Periods) 

distribute(occur,values,all(w in EWeeks & s in Slots) team[p,w,s]) 
extendedPropagation; 
foralKp in Periods & w in Weeks) 

link (team [p,w, home] , team [p,w, away] ,game [p,w] ) ; 

}; 



search { 

generate (game) ; 

}; 



Fig. 4. A Simple Model for the Sport-Scheduling Model. 
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int occur [t in Teams] = 2; 
int values [t in Teams] = t ; 

specifies two arrays that are initialized generically and are used to state con- 
straints later on. The array occur can be viewed as a constant function always 
returning 2, while the array values can be tought of as the identify function 
over teams. 

The main modeling idea in this model is to use two classes of variables: team 
variables that specify the team playing on a given week, period, and slot and the 
game variables specifying which game is played on a given week and period. The 
use of game variables makes it simple to state the constraint that every team 
must play against each other team. Games are uniquely identified by their two 
teams. More precisely, a game consisting of home team h and away team a is 
uniquely identified by the integer (h-1) *nbTecuns + a. The instruction 

var Teams teamfPeriods, EWeeks, Slots] ; 
var Games game [Periods .Weeks] ; 

declares the variables. These two sets of variables must be linked together to 
make sure that the game and team variables for a given period and a given week 
are consistent. The instructions 

struct Play { int f; int s; int g; }; 

{Play} Plays = { <i, j , (i-l)*nbTeams+j> I ordered i, j in Teams }; 

specify the set of legal games Plays for this application. For 8 teams, this set 
consists of tuples of the form 

< 1 , 2 , 2 > 

<1,3,3> 

<7,8, 56> 

Note that this definition eliminates some symmetries in the problem statement 
since the home team is always smaller than the away team. The instruction 

predicate link(int f,int s,int g) in Plays; 

defines a symbolic constraint by specifying its set of tuples. In other words, 
link(h,a,g) holds if the tuple <h,a,g> is in the set Plays of legal games. This 
symbolic constraint is used in the constraint statement to enforce the relation 
between the game and the team variables. 

The constraint declarations in the model follow almost directly the problem 
description. The constraint 

alldiff erent ( all(p in Periods & s in Slots) team[p,w,s]) onDomain; 

specifies that all the teams scheduled to play on week w must be different. It 
uses an aggregate operator all to collect the appropriate team variables by 
iterating over the periods and the slots and an annotation onDomain to enforce 
arc consitency. See [8] for a description on how to enforce arc consistency on this 
global constraint. The constraint 
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distribute(occur, values, all(w in EWeeks & s in Slots) team[p,w,s] ) 
extendedPropagation 

specifies that a team plays exactly twice over the course of the ’’extended” season. 
Its first argument specifies the number of occurrences of the values specified by 
the second argument in the set of variables specified by the third argument that 
collects all variables playing in period p. The annotation extendedPropagation 
specifies to enforce arc consistency on this constraint. See [9] for a description 
on how to enforce arc consistency on this global constraint. The constraint 

alldiff erent (game) onDomain; 

specifies that all games are different, i.e., that all teams play against each other 
team. These constraints illustrate some of the global constraints of OPL. Other 
global constraints in the current version include a sequencing constraint, a circuit 
constraint, and a variety of scheduling constraints. Finally, the constraint 

link (team [p,w,home] , team [p,w, away] .game [p, w] ) ; 

is most interesting. It specifies that the game game[p,w] consists of the teams 
teaun [p , w , home] and team [p , w , away] . OPL enforces arc-consitency on this sym- 
bolic constraint. 

The search procedure in this statement is extremely simple and consists of 
generating values for the games using the first-fail principle. Note also that gener- 
ating values for the games automatically assigns values to the team by constraint 
propagation. As mentioned, this model finds a solution for 14 teams in about 44 
seconds on a modern PC (400mhz). 



3.2 A Round-Robin Model 

The simple model has many symmetries that enlarge the search space consid- 
erably. In this section, we describe a model that uses a round-robin schedule to 
determine which games are played in a given week. As a consequence, once the 
round-robin schedule is selected, it is only necessary to determine the period of 
each game, not its schedule week. In addition, it turns out that a simple round- 
robin schedule makes it possible to find solutions for large numbers of teams. 
The model is depicted in Figures 5 and 6. 

The main novelty in the statement is the array roundRobin that specifies the 
games for every week. Assuming that n denotes the number of teams, the basic 
idea is to fix the set of games of the first week as 

<1,2> {<p-|-l,n p-|-2>l p>l 

where p is a period identifier. Games of the subsequent weeks are computed by 
transforming a tuple < f,s > into a tuple < /', s' > where 



/' = 



1 if / = 1 

2 if / = n 
/ 4- 1 otherwise 
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int nbTeams = . . . ; 
ramge Teams 1.. nbTeams; 
rEmge Weeks 1 . .nbTeams-1 ; 
range EWeeks 1.. nbTeams; 
rEmge Periods 1 . .nbTeams/2; 
ramge Games 1 . .nbTeams*nbTeams ; 
enum Slots = { home, away }; 

int occur [t in Teams] = 2; 
int values [t in Teams] = t ; 

var Teams team[Periods, EWeeks, Slots] ; 
var Games game [Periods , Weeks] ; 

struct Play { int f; int s; int g; }; 

{Play} Plays = { <i, j , (i-l)*nbTeams+j> I ordered i, j in Teams }; 
predicate link(int f,int s,int g) in Plays; 

Play roundRobin[Weeks,Periods] ; 
initialize { 

roundRobin[l , 1] .f = 1; 
roundRobin[l , 1] . s = 2; 
foralKp in Periods : p > 1) { 
roundRobin [1 ,p] . f = p+1; 
roundRobin [1 ,p] . s = nbTeams - (p-2) ; 

}; 

foralKw in Weeks: w > 1) { 
foralKp in Periods) { 

if roundRobin [w-1 ,p] . f <> 1 then 

if roundRobin [w- 1 ,p] .f = nbTeams then roundRobin [w,p] . f = 2 
else roundRobin [w,p] . f = roundRobin [w-1 ,p] . f + 1 endif 
else 

roundRobin [w,p] . f = roundRobin [w-1 ,p] . f ; 
endif ; 

if roundRobin [w-1 ,p] . s = nbTeams then roundRobin[w,p] . s = 2 
else roundRobin [w,p] . s = roundRobin [w- 1 ,p] . s + 1 endif; 

} 

}; 

foralKw in Weeks, p in Periods) 

if roundRobin [w,p] . f < roundRobin[w,p] . s then 

roundRobin [w,p] . g = nbTeams* (roundRobin [w,p] . f-1) + roundRobin [w,p] . s 
else 

roundRobin [w,p] . g = nbTeams*(roundRobin [w,p] . s-1) + roundRobin [w,p] . f 
endif ; 

}; 

{int} domain [w in Weeks] = { roundRobin [w,p] . g I p in Periods }; 



Fig. 5. A Round-Robin Model for the Sport-Scheduling Model (Part I). 
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solve { 

foralKp in Periods & w in Weeks) 
game [p , w] in domain [w] ; 
foralKw in EWeeks) 

alldif f erent ( all(p in Periods & s in Slots) team[p,w,s]) onDomain; 
alldiff erent (game) onDomain; 
foralKp in Periods) 

distribute(occur,values,all(w in EWeeks & s in Slots) team[p,w,s]) 
extendedPropagation; 
foralKp in Periods & w in Weeks) 

link (team [p,w, home] , team [p,w, away] ,game [p,w] ) ; 



search { 

foralKp in Periods) { 
generateSeq(game [p] ) ; 
foralKpo in Periods : po > 1) 
generate (game [po , p] ) ; 

}; 

}; 



Fig. 6. A Round- Robin Model for the Sport-Scheduling Model (Part II). 



and 

, _ J 2 if s = n 
( s -I- I otherwise 

This round-robin schedule is computed in the initialize instruction and the 
last instruction computes the game associated with the teams. The instruction 

{int} domain [w in Weeks] = { roundRobin [w,p] . g I p in Periods }; 

defines the games played in a given week. This array is used in the constraint 

game[p,w] in domain [w] ; 

which forces the game variables of period p and of week w to take a game allocated 
to that week. 

The model also contains a novel search procedure that consists of generating 
values for the games in the first period and in the first week, then in the second 
period and the second week, and so on. Table 7 depicts the experimental results 
for various numbers of teams. It is possible to improve the model further by 
exploiting even more symmetries: see [10] for complete details. 

4 Job-Shop Scheduling 

One of the other significant features of OPL is its support for scheduling applica- 
tions. OPL has a variety of domain-specific concepts for these applications that 
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Fig. 7. Experimental Results for the Sport-Scheduling Model 



are translated into state-of-the-art algorithms. To name only a few, they include 
the concepts of activities, unary, discrete, and state resources, reservoirs, and 
breaks as well as the global constraints linking them. 

Figure 8 describes a simple job-shop scheduling model. The problem is to 
schedule a number of jobs on a set of machines to minimize completion time, 
often called the makespan. Each job is a sequence of tasks and each task requires 
a machine. Figure 8 first declares the number of machines, the number of jobs, 
and the number of tasks in the jobs. The main data of the problem, i.e., the 
duration of all the tasks and the resources they require, are then given. The 
next set of instructions 

ScheduleHorizon = totalDuration; 

Activity task[j in Jobs, t in Tasks] (duration [j ,t] ) ; 

Activity makespan(O) ; 

UnaryResource tool [Machines] ; 

is most interesting. The first instruction describes the schedule horizon, i.e., the 
date by which the schedule should be completed at the lastest. In this application, 
the schedule horizon is given as the summation of all durations, which is clearly 
an upper bound on the duration of the schedule. The next instruction declares 
the activities of the problem. Activities are first-class objects in OPL and can 
be viewed (in a first approximation) as consisting of variables representing the 
starting date, the duration, and the end date of a task, as well as the constraints 
linking them. The variables of an activity are accessed as fields of records. In 
our application, there is an activity associated with each task of each job. The 
instruction 

UnaryResource tool [Machines] ; 

declares an array of unary resources. Unary resources are, once again, first-class 
objects of OPL; they represent resources that can be used by at most one activity 
at anyone time. In other words, two activities using the same unary resource 
cannot overlap in time. Note that the makespan is modeled for simplicity as an 
activity of duration zero. 

Consider now the problem constraints. The first set of constraints specifies 
that the activities associated with the problem tasks precede the makespan ac- 
tivity. The next two sets specify the precedence and resource constraints. The 
resource constraints specify which activities require which resource. Finally, the 
search procedure 
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int nbMachines = . . . ; 

rEmge Machines 1 . .nbMachines ; 

int nbJobs = . . . ; 

range Jobs l..nbJobs; 

int nbTasks = . . . ; 

ramge Tasks 1.. nbTasks; 

Machines resource [Jobs .Tasks] = 
int+ duration [Jobs, Tasks] = 

int totalDuration = sum(j in Jobs, t in Tasks) duration [j ,t] ; 
ScheduleHorizon = totalDuration; 

Activity task[j in Jobs, t in Tasks] (duration [j ,t] ) ; 

Activity makespan(O) ; 

UnaryResource tool [Machines] ; 

minimize 

make span. end 
subject to { 

foralKj in Jobs) 

task [j .nbTasks] precedes makespan; 
foralKj in Jobs & t in 1 . .nbTasks-1) 
task[j,t] precedes task[j,t+l]; 
foralKj in Jobs & t in Tasks) 

task[j,t] requires tool [resource [j ,t] ] ; 

}; 



search { 

LDSearchO { 

foralKr in Machines ordered by increasing localSlack(tool [r] ) ) 
rank (tool [r] ) ; 

} 

} 



Fig. 8. A Job-Shop Scheduling Model (jobshop.mod). 
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search { 

LDSearchO { 

foralKr in Machines ordered by increasing localSlackCtool [r] ) ) 
rank (tool [r] ) ; 

} 

} 

illustrates a typical search procedure for job-shop scheduling and the use of 
limited discrepancy search (LDS) [5] as a search strategy. The search procedure 

foralKr in Machines ordered by increasing localSlackCtool [r] ) ) 
rank(u [r] ) ; 

consists of ranking the unary resources, i.e., choosing in which order the activi- 
ties execute on the resources. Once the resources are ranked, it is easy to find a 
solution. The procedure ranks first the resource with the smallest local slack (i.e., 
the machine that seems to be the most difficult to schedule) and then considers 
the remaining resource using a similar heuristic. The instruction LDSearchO 
specifies that the search space specified by the search procedure defined above 
must be explored using limited discrepancy search. This strategy, which is effec- 
tive for many scheduling problems, assumes the existence of a good heuristic. Its 
basic intuition is that the heuristic, when it fails, probably would have found a 
solution if it had made a small number of different decisions during the search. 
The choices where the search procedure does not follow the heuristic are called 
discrepancies. As a consequence, LDS systematically explores the search tree by 
increasing the number of allowed discrepancies. Initially, a small number of dis- 
crepancies is allowed. If the search is not successful or if an optimal solution is 
desired, the number of discrepancies is increased and the process is iterated until 
a solution is found or the whole search space has been explored. Note that, be- 
sides the default depth-first search and LDS, OPL also supports best-first search, 
interleaved depth-first search, and depth-bounded limited discrepancy search. It 
is interesting to mention that this simple model solves MTIO in about 40 seconds 
and MT20 in about 0.4 seconds. 



5 A Configuration Problem 

This section illustrates OPLScript, a script language for controlling and composing 
OPL models. It shows how to solve an application consisting of a sequence 
of two models: a constraint programming model and an integer program. The 
application is a configuration problem, known as Vellino’s problem, which is a 
small but good representive of many similar applications. For instance, complex 
sport scheduling applications can be solved in a similar fashion. 

Given a supply of components and bins of various types, Vellino’s problem 
consists of assigning the components to the bins so that the bin constraints are 
satisfied and the smallest possible number of bins is used. There are five types 
of components, i.e., glass, plastic, steel, wood, and copper, and three types of 
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bins, i.e., red, blue, green. The bins must obey a variety of configuration con- 
straints. Containment constraints specify which components can go into which 
bins: red bins cannot contain plastic or steel, blue bins cannot contain wood or 
plastic, and green bins cannot contain steel or glass. Capacity constraints specify 
a limit for certain component types for some bins: red bins contain at most one 
wooden component and green bins contain at most two wooden components. 
Finally, requirement constraints specify some compatibility constraints between 
the components: wood requires plastic, glass excludes copper and copper ex- 
cludes plastic. In addition, we are given an initial capacity for each bin, i.e., red 
bins have a capacity of 3 components, blue bins of 1 and green bins of 4 and a 
demand for each component, i.e., 1 glass, 2 plastic, 1 steel, 3 wood, and 2 copper 
components. 



Model binC'genBin.mod" , "genBin.dat") ; 
import enum Colors bin. Colors; 
import enum Components bin. Components; 
struct Bin { Colors c; int n [Components] ; }; 
int nbBin := 0; 

Open Bin bins [1 . .nbBin] ; 
while bin.nextSolutionO do { 
nbBin := nbBin + 1; 
bins . addhO ; 
bins [nbBin] . c := bin.c; 
foralKc in Components) 

bins [nbBin] .n[c] := bin.n[c]; 

} 

Model proC'chooseBin.mod" , "chooseBin.dat") ; 
if pro. solve () then 

cout << "Solution at cost: " « pro . objectiveValue () « endl; 
Fig. 9. A Script to Solve Vellino’s Problem (vellino . osc) . 



The strategy to solve this problem consists of generating all the possible bin 
configurations and then to choose the smallest number of them that meet the 
demand. This strategy is implemented using the script depicted in Figure 9 and 
two models genBin.mod and chooseBin.mod depicted in Figures 10 and 11. It 
is interesting to study the script in detail at this point. The instruction 

Model binC'genBin.mod" , "genBin.dat") ; 

declare the first model. Models are, of course, a fundamental concept of OPLScript: 
they support a variety of methods (e.g., solve and nextSolution), their data 
can be accessed as fields of records, and they can be passed as parameters to 
procedures. The instructions 
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enum Colors . . . ; 
enum Components . . . ; 
int capacity [Colors] = 

int maxCapacity = max(c in Colors) capacityEc]; 
var Colors c; 

var int n [Components] in 0 . .maxCapacity ; 
solve { 

0 < sum(c in Components) n[c] <= capacityEc]; 

c = red => n [plastic] = 0 & n [steel] = 0 & n[wood] <= 1; 

c = blue => n [plastic] = 0 & n[wood] = 0; 

c = green => n [glass] = 0 & n [steel] = 0 & n[wood] <= 2; 

n[wood] >= 1 => n [plastic] >= 1; 

n [glass] = 0 \/ n [copper] = 0; 

n [copper] = 0 \/ n [plastic] = 0; 

}; 



Fig. 10. Generating the Bins in Vellino’s Problem (genBin.mod) . 



import enum Colors ; 
import enum Components; 

struct Bin { Colors c; int n [Components] ; }; 

import int nbBin; 

import Bin binsEl. .nbBin] ; 

rcuige R 1.. nbBin; 

int demand [Components] = ...; 

int maxDemand = max(c in Components) demand [c]; 

var int produce [R] in 0 .. maxDemand ; 

minimize 

sum(b in R) produce [b] 
subject to 

foralKc in Components) 

sum(b in R) bins[b].n[c] * produce [b] = demand [c]; 



Fig. 11. Choosing the Bins in Vellino’s Problem (chooseBin.mod) . 
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import enum Colors bin. Colors; 
import enum Components bin. Components; 



import the enumerated types from the model to the script; these enumerated 
types will be imported by the second model as well. The instructions 

struct Bin { Colors c; int n [Components] ; }; 
int nbBin := 0; 

Open Bin bins [1 . .nbBin] ; 

declare a variable to store the number of bin configurations and an open array 
to store the bin configurations themselves. Open arrays are arrays that can grow 
and shrink dynamically during the execution. The instructions 

while bin.nextSolutionO do { 
nbBin : = nbBin + 1 ; 
bins . addhO ; 
bins [nbBin] . c : = bin . c ; 
foralKc in Components) 

bins [nbBin] .n[c] := bin.n[c] ; 

} 

enumerate all the bin configurations and store them in the bin array in model 
pro. Instruction bin.nextSolutionO returns the next solution (if any) of the 
model bin. Instruction bins.addh increases the size of the open array (addh 
stands for ’’add high”). The subsequent instructions access the model data and 
store them in the open array. Once this step is completed, the second model is 
executed and produces a solution at cost 8. 

Model genBin.mod specifies how to generate the bin configurations: It is a 
typical constraint program using logical combinations of constraints that should 
not raise any difficulty. Model chooseBin . mod is an integer program that chooses 
and minimizes the number of bins. This model imports the enumerated types as 
mentioned previously. It also imports the bin configurations using the instruc- 
tions 

import int nbBin; 
import Bin bins[l. .nbBin] ; 

It is important to stress to both models can be developed and tested indepen- 
dently since import declarations can be initialized in a data file when a model is 
run in isolation (i.e., not from a script). This makes the overall design composi- 
tional. 



6 Conclusion 

The purpose of this paper was to review, through four applications, a number 
of constraint programming features of OPL to give a basic understanding of the 
expressiveness of the language. These features include very high-level algebraic 
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notations and data structures, a rich constraint programming language sup- 
porting logical, higher-level, and global constraints, support for scheduling and 
resource allocation problems, and search procedures and strategies. The paper 
also introduced briefly OPLScript, a script language to control and compose OPL 
models. The four applications presented in this paper should give a preliminary, 
although very incomplete, understanding of how OPL can decrease development 
time significantly. 
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Abstract. We introduce the most recent and advanced implementation 
of constraint handling rules (CHR) in a logic programming language, 
which improves both on previous implementations (in terms of complete- 
ness, flexibility and efficiency) and on the principles that should guide 
such a Prolog implementation consisting of a runtime system and a com- 
piler. The runtime system utilizes attributed variables for the realization 
of the constraint store with efficient retrieval and update mechanisms. 
Rules describing the interactions between constraints are compiled into 
Prolog clauses by a multi-phase compiler, the core of which comprises 
a small number of compact code generating templates in the form of 
definite clause grammar rules. 

Keywords: Logic and constraint programming. Implementation and 
compilation methods. 



1 Introduction 

In the beginning of constraint logic programming (CLP), constraint solving was 
“hard- wired” in a built-in constraint solver written in a low-level language. While 
efficient, this so-called “black-box” approach makes it hard to modify a solver 
or build a solver over a new domain, let alone debug, reason about and analyze 
it. This is a problem, since one lesson learned from practical applications is 
that constraints are often heterogeneous and application-specific. Consequently, 
several proposals have been made to allow more for flexibility and customization 
of constraint systems (“glass-box” or even “no-box” approaches): 

— Demons, forward rules and conditionals in CHIP [6] allow the definition of 
propagation of constraints in a limited way. 

* Part of this work was performed while visiting CWG at LMU with financial snpport 
from DFG. 
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— Constraint combinators in cc(FD) [13] allow to build more complex con- 
straints from simpler constraints. 

— Constraints connected to a Boolean variable in BNR-Prolog [2] and “nested 
constraints” [31] allow to express any logical formula over primitive con- 
straints. 

— Indexicals in clp(FD) [5] allow to implement constraints over finite domains 
at a medium level of abstraction. 

— Meta- and attributed variables [26], [21], [15] allow to attach constraints to 
variables at a low level of abstraction. 

It should be noted that all the approaches but the last can only extend a solver 
over a given, specific constraint domain, typically finite domains. The expressive 
power to realize other (application-specific) constraint domains is only provided 
by the last approach. 

Attributed variables provide direct access storage locations for properties as- 
sociated with variables. When such variables are unified, their attributes have to 
be manipulated. Thus attributed variables make unification user-definable [15], 
[16], [17]. Attributed variables require roughly the same implementation effort 
as hard-wired delay (suspension) and coroutining mechanisms found in earlier 
Prolog implementations, while being more general. And indeed, attributed vari- 
ables nowadays serve as the primary low-level construct for implementing sus- 
pension (delay) mechanisms and constraint solver extensions in many constraint 
logic programming languages, e.g. SICStus [4] and ECL*PS® [3] Prolog. How- 
ever writing constraints this way is tedious, a kind of “constraint assembler” 
programming. 

If there already is a powerful constraint assembler, one may wonder what 
an associated high-level language could look like. Our proposal is a declara- 
tive language extension especially designed for writing constraint solvers, called 
constraint handling rules (CHR) [10], [12], [18], [11]. With CHR, one can intro- 
duce user-defined constraints into a given high level host language, be it Prolog 
or Lisp. As language extension, CHR themselves are only concerned with con- 
straints, all auxiliary computations are performed in the host language. CHR 
have been used in dozens of projects worldwide to encode dozens of constraint 
handlers (solvers), including new domains such as terminological and temporal 
reasoning. If comparable hard-wired constraint solvers are available, the price to 
pay for the flexibility of CHR is often within an order of magnitude in runtime. 
The performance gap can in many cases be eliminated by tailoring the CHR 
constraints to the specifics of the class of applications at hand. 

CHR is essentially a committed-choice language consisting of guarded rules 
that rewrite constraints into simpler ones until they are solved. CHR can define 
both simplification of and propagation over user-defined constraints. Simplifica- 
tion replaces constraints by simpler constraints while preserving logical equiva- 
lence. Propagation adds new constraints which are logically redundant but may 
cause further simplification. CHR can be seen as a generalization of the various 
CHIP [6] constructs for user-defined constraints. 
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In contrast to the family of the general-purpose concurrent logic program- 
ming languages [29], concurrent constraint languages [28] and the ALPS [23] 
framework, CHR are a special-purpose language concerned with defining declar- 
ative objects, constraints, not procedures in their generality. In another sense, 
CHR are more general, since they allow for multiple heads, i.e. conjunctions of 
constraints in the head of a rule. Multiple heads are a feature that is essential 
in solving conjunctions of constraints. With single-headed CHR alone, unsatis- 
fiability of constraints could not always be detected (e.g X<Y,Y<X) and global 
constraint satisfaction could not be achieved. The probably most distinguish- 
ing functionality of CHR is that they act as a powerful iteration, retrieval, and 
upadte mechanism over the constraint store, a data structure holding constraints. 

The first implementation of CHR in 1991 was an interpreter written in 
ECL®PS® Prolog. Then, the CHR language has been implemented in 1993 in 
Common LISP at the German Research Institute for Artificial Intelligence [14] 
and in 1994 as a library of ECL®PS® [9], [10]. A CHR interpreter was written 
in the concurrent logical object-oriented constraint language OZ [32] in 1996. 
Independent of our work, a new experimental prototype of CHR has been im- 
plemented recently in ECL®PS® 4.0 [30]. 

CHR are typically realized as a library containing a compiler, runtime system 
and solvers written in CHR. With Prolog as the host language, the idea is to 
realize the CHR constraint store through attributed variables. Rule application 
compiles into Prolog clauses which inspect and update the constraint store at 
runtime. Thus CHR can also be understood as a powerful means to manipulate 
the attributes of variables in a declarative high-level fashion. In this paper we 
introduce the most recent and advanced implementation of CHR in SICStus 
Prolog [18], which improves both on the previous implementation [10] in terms 
of completeness, flexibility and efficiency and on the principles that should guide 
such an implementation [9]. The new release also includes about 30 constraint 
solvers written in CHR. 

For the user, the new release of CHR improves over older versions in the 
following aspects: 

~ The number of heads in a rule is no longer limited to two. 

— Guards now with Ask and Tell as in concurrent constraint languages. 

— Code runs generally about twice as fast as in older versions. 

— For more control, rules are compiled in textual order. 

— Compilation is now transparent to the user, on-the-fly when loading. 

— Improved set of built-in predicates for advanced CHR users. 

— Constant time access to constraints of one type for elevated performance. 

— New options and pragmas for powerful compiler optimizations. 

— Runtime system includes a stepper for Prolog-like debugging. 

Similar issues, i.e. compilation of committed-choice languages into Prolog, 
have been investigated before, be it translating GHC [33], implementations of 
delay declarations [25] or the efficient implementation of QD- Janus [8]. Today, we 
benefit from more powerful programming constructs, in particular customizable 
suspension mechanisms provided by attributed variables. CHR specific topics 
are multiple heads and propagation rules. 
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Overview of this Paper We quickly recapture syntax and semantics for CHR. 
Then we describe the three phases of the new compilation scheme and the run- 
time system for CHR. We conclude with a comparison with the previous imple- 
mentation. This paper is a revised version of [19]. 

An example will guide us through the paper. Even though it does not define 
a typical constraint, we chose it for didactic reasons. It is small but can still 
illustrate the various stages of our compilation scheme. We use Prolog syntax in 
this paper. 

Example 1 (Primes). We implement the sieve of Eratosthenes to compute primes 
in a way reminiscent of the “chemical abstract machine” [1]: The constraint 
candidates (N) generates candidates for prime numbers, prime (M), where M is 
between 1 and N. The candidates react with each other such that each number 
absorbs multiples of itself. In the end, only prime numbers remain. 

candidates(l) <=> true. 

generate @ candidates(N) <=> N>1 I M is N-1, prime(N), candidates (M) . 

sieve @ prime (I) \ prime (J) <=> J mod I =:= 0 I true. 

The first rule says that the number 1 is not a good candidate for a prime, 
candidates (1) is thus rewritten into true, a constraint that is always satisfied 
and therefore it has no effect. Note that head matching is used in CHR so the 
first rule will only apply to candidates (1) . A constraint for candidates with 
a free variable, like candidates (X) , will suspend (delay). 

The generate rule generates a candidate prime (N) and proceeds recursively 
with the next smaller number, provided the guard (precondition, test) N>1 is 
satisfied. 

The third, multi-headed rule named sieve reads as follows: If there is a 
constraint prime (I) and some other constraint prime (J) such that J mod I 
= := 0 holds, i.e. J is a multiple of I, then keep prime (I) but remove prime (J) 
and execute the body of the rule, true. 

2 Syntax and Semantics 

We assume some familiarity with (concurrent) constraint (logic) programming, 
e.g. [29], [13], [28], [22], [24]. As a special purpose language, CHR extend a host 
language with (more) constraint solving capabilities. Auxiliary computations in 
CHR programs are executed as host language statements. Here the host lan- 
guage is (SICStus) Prolog. For more formal and detailed syntax and semantics 
of constraint handling rules see [12], [11]. 

2.1 Syntax 

Definition 1. There are three kinds of CHR. A simplification CHR is of the 
formf 

^ For simplicity, we omit syntactic extensions like pragmas which are not relevant for 
this paper. 
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[Name Headl , . . . ,HeadN ’<=>’ [Guard ’I’] Body. 

where the rule has an optional Name, which is a Prolog term, and the multi- 
head Headl, . . . ,HeadN is a conjunction of CHR constraints, which are Prolog 
atoms. The guard is optional; if present. Guard is a Prolog goal excluding CHR 
constraints; if not present, it has the same meaning as the guard ’true I ’ . The 
body Body is a Prolog goal including CHR constraints. 

A propagation CHR is of the form 

[Name ’S’] Headl ,HeadN ’==>’ [Guard ’I’] Body. 

A simpagation CHR is a combination of the above two kinds of rule, it is of 
the form 

[Name ’0’] Headl \ ,HeadN ’<=>’ [Guard ’I’] Body. 

where the symbol ’ \ ’ separates the head constraints into two nonempty parts. 

A simpagation rule combines simplification and propagation in one rule. The 
rule HeadsK \ HeadsR <=> Body is equivalent to the simplification rule HeadsK, 
HeadsR <=> HeadsK, Body, i.e. HeadsK is kept while HeadsR is removed. How- 
ever, the simpagation rule is more compact to write, more efficient to execute 
and has better termination behaviour than the corresponding simplification rule. 



2.2 Semantics 

Declaratively^, a rule relates heads and body provided the guard is true. A 
simplification rule means that the heads are true if and only if the body is 
satisfied. A propagation rule means that the body is true if the heads are true. 

In this paper, we are interested in the operational semantics of CHR in 
actual implementations. A CHR constraint is implemented as both code (a Prolog 
predicate) and data (a Prolog term) in the constraint store, which is a data 
structure holding constraints. Every time a CHR constraint is posted (executed) 
or woken (reconsidered), it triggers checks to determine the applicability of the 
rules it appears in. Such a constraint is called (currently) active, while the other 
constraints in the constraint store that are not executed at the moment are called 
(currently) passive. 

Heads. For each CHR, one of its heads is matched against the constraint. 
Matching succeeds if the constraint is an instance of the head, i.e. the head serves 
as a pattern. If a CHR has more than one head, the constraint store is searched 
for partner constraints that match the other heads. If the matching succeeds, 
the guard is executed. Otherwise the next rule is tried. 

Guard. A guard is a precondition on the applicability of a rule. The guard 
either succeeds or fails. A guard succeeds if the execution succeeds without 

^ Unlike general committed-choice programs, CHR programs can be given a declarative 
semantics since they are only concerned with defining constraints, not procedures in 
their generality. 
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causing an instantiation error^ and without touching a variable from the heads. 
A variable is touched if it takes part in a unification or gets more constrained by 
a built-in constraint. If the guard succeeds, the rule applies. Otherwise it fails 
and the next rule is tried. 

Body. If the firing CHR is a simplification rule, the matched constraints 
are removed from the store and the body of the CHR is executed. Similarly for 
a firing simpagation rule, except that the constraints that matched the heads 
preceding are kept. If the firing CHR is a propagation rule the body of 
the CHR is executed without removing any constraints. It is remembered that 
the propagation rule fired, so it will not fire again (and again) with the same 
constraints. Since the currently active constraint has not been removed, the next 
rule is tried. 

Suspension. If all rules have been tried and the active constraint has not 
been removed, it suspends (delays) until a variable occurring in the constraint 
is touched. Here suspension means that the constraint is inserted into the con- 
straint store as data. 



3 The Compiler 

The compiler is written in (SICStus) Prolog [18] and translates CHR into Pro- 
log on-the-fly, while the file is loaded (consulted) . Its kernel consists of a definite 
clause grammar that generates the target instructions (clauses) driven by tem- 
plates. We will use example 1 to explain the three phases of the compiler: (1) 
Parsing, (2) translating CHR into clauses using templates and (3) partial eval- 
uation using macros. Of course, phase (2) is the essential one that encodes the 
algorithm. 



3.1 Parsing Phase 

Using the appropriate operator declarations, a CHR can be read and written as a 
Prolog term. Hence parsing basically reduces to computing information from the 
parse tree and to producing a canonical form of the rules. Information needed 
from the parse tree includes: 

— The set of global variables, i.e. those that appear in the heads of a rule. 

— The set of variables shared between the heads. 

In the canonical form of the rules, 

— each rule is associated with a unique identifier, 

— rule heads are collected into two lists (named Keep and Remove), and 

— guard and body are made explicit with defaults applied. 

® A built-in predicate of Prolog complains about free variables where it needs instan- 
tiated ones. 
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One list, called Keep, contains all head constraints that are kept when the rule 
is applied, the other list, called Remove, contains all head constraints that are 
removed. Lists may be empty. As a result of this representation, simplification, 
propagation and simpagation rules can be treated uniformly. 

Example 2 (Primes, contd.). The canonical form of the rules for the prime num- 
ber example is given below. 



rule (Id, Keep, 


Remove , Guard , 


Body) 


rule( 1, [] , 


[candidates(l)] , true. 


true) . 


rule( 2, [] , 


[candidates (A) ] , A>1, 


(B is A-1 ,prime(A) , 






candidates (B) ) ) . 


rule( 3, [prime (A)] 


, [prime (B)] , B mod A =: 


: = 0 , true) . 



3.2 Translation Phase 

Each CHR constraint compiles into Prolog clauses that try the constraint with 
all rules in whose heads it occurs. The resulting compilation process is nonlocal in 
the sense that a CHR constraint may appear in various head positions in various 
rules. Each occurrence of a CHR constraint in the head of a rule gives rise to 
one clause for that constraint. The clause head contains the active constraint, 
while the clause body does the following: 

— match formal parameters to actual arguments of head constraint 

— find and match partner constraints 

— check the guard 

— commit via cut 

— remove matched constraints if required 

— execute body of rule 

We first illustrate the compilation with a simple example, a single-headed 
simplification CHR, then we consider general cases of arbitrary multi-headed 
rules. 

Example 3 (Primes, eontd.). For the constraint candidates/1 the compiler gen- 
erates the following intermediate code (edited for readability). 



"/o in rule candidates(l) <=> true 



candidates (A) "/I 

match([l], [A]), "/ 2 

check_guard( [] , true), "/ 3 

! , •/. 4 

true . "/o 5 



"/o in rule candidates(N) <=> N>1 I M is N-1, prime(N), candidates(M) 
candidates (A) "/ 6 

match ( [C] , [A] ) , "/ 7 
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check_guard( [C] , C>1) , "/ 8 

! , •/. 9 

D is C-1, •/. 10 

prime (C), "/ 11 

candidates (D) . "/ 12 

"/o if no rule applied, suspend the constraint on its variables 
candidates (A) "/ 13 

suspend(candidates(A) ) . "/ 14 

The predicate match (LI ,L2) matches the actual arguments L2 against the 
formal parameters LI. The predicate check_guard(VL,G) checks the guard G. 
check_guard/2 fails as soon as the global variables (list VL) are touched'^. 

When no rule applied, the last clause inserts the constraint into the constraint 
store using a suspension mechanism. It allocates the suspension data structure 
and associates it with each variable occurring in the constraint. Touching any 
such variable will wake the constraint. 

The real challenge left is to implement multi-headed CHR. In a naive imple- 
mentation of a rule, the constraint store is queried for the cross-product of match- 
ing constraints. For each tuple in the cross-product the guard is checked in the 
corresponding environment. If the guard is satisfied, constraints that matched 
heads in the Remove list are removed from the store and the instance of the rule’s 
body is executed. Note that the removal of constraints removes tuples from the 
cross-product. 

Our implementation computes only those tuples in the cross-product that 
are really needed (as in [9]). Moreover, nondeterministic enumeration of the 
constraints is preferred over deterministic iteration whenever possible, because 
Prolog is good at backtracking [20] . 

For each head constraint in a rule the compiler does the following: It is deleted 
from the Keep or Remove list, respectively, and it is rendered as the active one. 
Whether the active constraint is removed when the rule applies, and whether 
any other head constraints are removed, leads to the following three prototypical 
cases, each covered by a code generating template in the compiler: 

1. Case Active constraint from Remove list 

2. Case Active constraint from Keep list. Remove list nonempty 

3. Case Active constraint from Keep list. Remove list empty 

Interestingly, the three cases do not directly correspond to the three kinds of 
CHR. 



Case 1. Active constraint from Remove list The active head constraint 
is to be removed if the rule applies, so the rule under consideration is either 
a simplification or simpagation rule. It can be applied at most once with the 

^ In most Prolog implementations, it is more efficient to re-execute head matching and 
guards instead of suspending all of them and executing them incrementally. 
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current active constraint. The search for the partner constraints in this case can 
be done through nondeterministic enumeration. Here is the template as DCG 
grammar rule, slightly abridged. The predicate ndmpc generates the code to 
nondeterministically enumerate and match the partners, one by one. 

compile (remove (Active) , Remove, Keep, Guard, Body, ...) — > 

"/o compiler code 

{ 

Active =. . [_ I Args] , 
same_length(Args , Actual), 

ndmpc (Remove , RemoveCode , RemCs , . . . ) , 
ndmpc (Keep, KeepCode, ...) 

}, 

"/o generated code 

[(constraint(head(F/A,R-N) , args(Actual)) 
match(Args, Actual), 

RemoveCode, "/ Identify Remove partners 

KeepCode, "/ Identify Keep partners 

check_guard(Vars , Guard), 

! 

remove_constraints (RemCs) , 

Body 

)] . 



The variables F,A,R and N stand for functor, arity of the constraint, rule 
identifier and number of head in rule, respectively. 

Example 4 (Primes, contd.). The second occurrence of prime/1 in rule 3 of Ex- 
ample 1 matches this template, and here is its instantiation: 

"/o prime(I) \ prime(J) <=> J mod I =:= 0 I true, 
constraint (head (prime/1 , 3-2) , args ([A])) 
match([C], [A]), 

"/, RemoveCode (for one partner constraint) 
get_constr_via( [] , Constraints) , 

nd_init_iteration(Constraints , prime/1, Candidate), 
get_args (Candidate , [F] ) , 

match ( [C] - [G] , [C]-[F]), 

"/, KeepCode (no partner constraints to be kept in this case) 
true , 

"/, Guard 

check_guard( [G,C] , (C mod G =:= 0)), 

I 

• J 

remove_constraints( [] ) , "/, no constraints to remove here 

"/, Body 

true . 




126 



Christian Holzbaur and Thom Friihwirth 



The predicate get_constr_via(VL,Cs) returns a handle Cs to the constraints 
suspended on a free variable occurring in the list VL. If there is no variable in VL, 
it returns a handle to all the constraints in the store. nd_init_iteration(Cs , 
F/A, Candidate) nondeterministically returns a candidate constraint with func- 
tor F and arity A through the handle Cs. 



Case 2. Active constraint from Keep list, Remove list nonempty This 
case applies only if there is at least one constraint to be removed, but the ac- 
tive constraint will be kept. It can only originate from a simpagation rule. Since 
the active constraint is kept, one has to continue looking for applicable rules, 
even after the rule applied. However, since at least one partner constraint will 
have been removed, the same rule will only be applicable again with another 
constraint from the store in place of the removed one. Therefore, we can de- 
terministically iterate over the constraints that are candidates for matching the 
corresponding head from Remove, while the remaining partners can be found via 
nondeterministic enumeration as before. At the end of the iteration, we have to 
continue with the remaining rules for the active constraint. 

Example 5 (Primes, contd.). For space reasons, we just present a simple instance 
of the template, originating from the first occurrence of prime/ 1 in rule 3 (for 
readability with the predicate already flattened, as described in Section 3.3): 

"/o rule prime(I) \ prime(J) <=> J mod I =:= 0 I true, 
prime (A, B) 

get_constr_via( [] , C) , "/ get constraints from store 

init_iteration(C, prime/1, D) , "/ get partner CEUididates 

I 

• J 

prime (D, B, A). "/ try to apply the rule 



prime (A, B, C) 

iteration_last (A) , 
prime_l(C, B) . 
prime (A, B, C) 

iteration_next(A, D, E) , 

( get_args(D, [F] ) , 

matchC [C] - [G] , [C] - [F] ) , 
check_guard( [C,G] , (G mod 

-> 

remove_constraints ( [D] ) , 
true 

), 

prime (E, B, C) . 
prime_l(C, B) ... 



"/o no more partner candidate 
"/o try next rule head 

"/o try next partner candidate 
=:= 0 )) 

"/o rule applies 

"/o remove the partner from store 

"/o rule did not apply 

"/o in any case, try same rule 

"/o with another partner candidate 

"/o code to try next rule head 



Case 3. Active constraint from Keep list. Remove list empty This case 
originates from propagation rules. Since no constraint will be removed, all pos- 
sible combinations of matching constraints have to be tried. The rule under 
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consideration may apply with each combination. Therefore, all the partners (not 
just one as in the previous case) have to be searched through nested determinis- 
tic iteration. No matter if and how often the rule was applicable, at the end we 
have to continue with the remaining rules for the active constraint as in Case 2. 

Example 6. This propagation rule is part of an interval solver. X: : Min: Max con- 
strains X to be within lower and upper bounds Min and Max. 

X le Y, X: :MinX:MaxX, Y: :MinY:MaxY ==> X : : MinX : MaxY , Y : : MinX : MaxY . 

The propagation rule produces roughly the following code for X le Y. 

X le Y le_l(X, Y) . 



le_l(X, Y) 

get_constr_via( [X] , CXs) , 
init_iteration(CXs , ::/2, PCXs) , 

I 

• J 

le_l_0(PCXs, X, Y). 
le_l(X, Y) 

le_2(X, Y). 



"/, active constraint (X le Y) 

"/, get constraints on X 
"/, get partner candidates 

"/, try to apply the rule 

"/, rule was not applicable at all 

"/, continue with next rule 



le_2(X, Y) 

suspend (X le Y) . 



"/, no next rule 

"/, done , suspend the constraint 



le_l_0(PCXs, X, Y) 1 

iteration_last (PCXs) , "/, 

le_2(X, Y). •/. 

le_l_0(PCXs, X, Y) 

iteration_next (PCXs , CX, PCXsl) , "/, 

( get_args (CX, . . . ) , match( . . . ) 
get_constr_via( [Y] , CYs) , "/, 

init_iteration(CYs , ::/2, PCYs) 



outer loop for X::MinX:MaxX 
no more partner candidate 
continue with next rule 

try next partner candidate for X 

match arguments 

constraints on Y for next head 



-> 



le_l_l(PCYs, PCXsl, X, Y) 



"/, try to apply the rule 



le_l_0(PCXsl, X, Y) 

). 



"/, try next partner candidate for X 



le_l_l(PCYs, PCXs, X, Y) 1 

iteration_last (PCYs) , "/, 

le_l_0(PCXs, X, Y). •/. 

le_l_l(PCYs, PCXs, X, Y) 

iteration_next (PCYs , CY, PCYsl) , "/, 
( get_args (CY, . . . ) , match( . . . ) ,"/ 
-> •/. 



X : : MinX : MaxY , Y : : MinX : MaxY , "/. 
le_l_l (PCYsl, PCXs, X, Y) t 
; •/. 
le_l_l (PCYsl, PCXs, X, Y) t 

). 



inner loop for Y::MinY:MaxY 
no more partner candidate for Y 
continue with outer loop for X 

try next partner CEUididate for Y 
match arguments 
rule applies finally 
rule body 

continue, find another Y partner 

rule did not apply 

continue, find another Y partner 
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3.3 Partial Evaluation Phase 

The translation granularity was chosen so that the generated code would roughly 
run as is, with little emphasis on efficiency coming from local optimizations and 
specializations. These are performed in the final, third phase of the compiler 
using a simple instance of partial evaluation (PE). It is performed by using 
macros as they are available in most Prolog systems, e.g. [4]. In contrast to 
approaches that address all aspects of a language in a partial evaluator such as 
Mixtus [27], our restricted form of PE can be realized with an efficiency that 
meets the requirements of a production compiler. 

The functionalities of the main compiler macros: 

— The generic predicates steering the iteration over partner constraints are 
specialized with respect to a particular representation of these multi sets. 

— Recursions (typically iterations over lists) that are definite at compile time 
are unfolded at compile time. 

— As in [33] , head matching is specialized into unification instructions guarded 
by nonvar/ 1 tests. 

— The intermediate code uses redundant function symbols for the convenience 
of the compiler writers, e.g. to keep object, compiler and runtime-system 
variables visually apart. The redundant function symbols also help in type- 
checking the compiler. Redundant function symbols are absent in the target 
code. In particular, clause heads are flattened to facilitate clause indexing. 
For example, constraint (head (prime/1, 3-2) , args( [A] )) will be trans- 
formed into something like primeU_2(A). 



Example 7 (Primes, contd.). The macro expansion phase results in the following 
code for our example 3. The code for matching and guard checking has been in- 
lined. The resulting trivial matchings (line 7), guards (line 3) and bodies (line 
5) have been removed by PE. 



"/o rule candidates (1) <=> true, 
candidates (A) 

A==l, 

! . 

"/o rule candidates (N) <=> N>1 I 
candidates (A) 

nonvar (A) , 

A>1, 

I 

• J 

B is A-1, 
prime (A) , 
candidates (B) . 
candidates (A) 

suspend(candidates(A) ) 



1 1 
•/. 2 
•/. 4 

M is N-1, prime (N), candidates (M) . 
•/. 6 
•/. 8 
I 8 
•/. 9 
•/. 10 
•/. 11 
1 12 
•/. 13 
•/. 14 
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4 The Runtime System 

The code generated by the compiler utilizes Prolog since CHR compile into 
clauses. Thus e.g. memory management is already taken care of. There are how- 
ever functionalities that are not provided directly by most Prolog implementa- 
tions: 

— We need means to suspend, wake and re-suspend constraint predicates. 

— We need efficient access to suspended constraints in the store through dif- 
ferent access paths. 

The vanilla suspension mechanisms used by earlier CHR implementations ad- 
dressed the first issue above, but did not optimize re-suspension. The second 
issue was partially ignored in that plain linear search in (parts of) the constraint 
store was used. 

4.1 Suspensions 

Typically, the attributes of variables are goals that suspend on that variable. 
They are re-executed (woken) each time one of their variables is touched. Via 
the attributed variables interface as found in SICStus or ECL*PS® Prolog the be- 
haviour of attributed variables under unification is specified with a user-defined 
predicate. In the CHR implementation, suspended goals are our means to store 
constraints. 

In more detail, the components of the CHR suspension data structure are: 

— Constraint goal 

~ State of constraint 

— Unique identifier 

~ Propagation history 

— Re-use counter 

The state indicates if the constraint is active or passive.® The unique iden- 
tifier is used, together with the propagation history, to ensure termination for 
propagation rules. Each propagation rule fires at most once for each tuple formed 
by the set of matched head constraints. The re-use counter is incremented with 
every re-use of the suspension. It is used for profiling and some more subtle 
aspects of controlling rule termination outside the scope of this paper. 

To optimize re-suspensions, we made the suspension itself an argument of the 
re-executed goal. Internally, each constraint has an additional argument. When 
first executed, the argument is a free variable. When the constraint suspends, 
this extra argument is bound to the suspension itself. When it runs again, the 
suspension mechanism now has a handle to the suspension and can update its 
state. Traces of this mechanism were removed from the listed code samples in 
this paper to avoid confusion. 

® In actuality the granularity of states and transitions is more copious. The additional 
mechanics mainly address lazy constraint removal to anticipate the possibility of 
subsequent constraint re-introduction. 
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4.2 Access Paths 

When a CHR searches for a partner constraint, a variable common to two 
heads of a rule considerably restricts the number of candidate constraints to 
be checked, because both partners must be suspended on this variable. Thus 
we usually access the constraint store by looking at only those constraints (cf. 
get_constr_via/2). We also know functor and arity of the partner. Conse- 
quently, we want direct access to the set of constraints of given functor/ arity. 
Earlier implementations performed this selection by linear search over a part of 
the suspended constraints. 

Access to data through a variable, and then functor /arity, is exactly the 
functionality provided efficiently by attributed variables. In our runtime system 
we map every functor/arity pair to a fixed attribute slot of a variable at com- 
pile time yielding constant time access to the constraints of one type. Only the 
arguments need to be matched at runtime. 



5 Preliminary Empirics 

Benchmarks are difficult, because the new implementation is in SICStus Pro- 
log, while the previous one was in ECL*PS® Prolog. Attributed variables are 
implemented differently in these Prologs. That said, our inchoate measurements 
indicate that the new compiler produces code that is roughly twice as fast. Specif- 
ically, we compared our new SICStus CHR implementation with the one 
in distribution with ECL®PS® 3.5.2, measuring the variation between the two 
Prolog implementations together with the actual CHR implementation differ- 
ences. Times are given in seconds. ECL*PS® and SICStus were run on the same 
machine (a Sun workstation). In ECL*PS®, the solvers were compiled without 
debugger hooks®. We have two columns for SICStus: one for native code, one for 
emulated code. The last column relates emulated SICStus and ECL'PS®. 



Benchmark 


SICStus 

native 


a) SICStus 
emulated 


b) ECL*PS® 


ratio ajh 


solver bool 

deussenl ulm027rl, all solutions 


0.370 


0.470 


0.900 


0.52 


schur(10,_), all solutions 


1.020 


1.300 


2.584 


0.50 


schur(13,_), 1st solution 


0.230 


0.290 


1.233 


0.24 


schur(13,_), all solutions 


2.040 


2.520 


7.483 


0.34 


bnqueens(8,L), 1st solution 


1.240 


1.500 


9.817 


0.15 


testbl(5,L), all solutions 


0.750 


0.900 


1.467 


0.61 


solver lists 

word problem, 1st solution 


0.380 


0.460 


0.633 


0.73 


word problem, 2nd solution 


2.940 


3.660 


4.717 


0.78 



The new CHR version was faster on all examples, the ratio new vs. old ranging 
from 0.15 to 0.78, averaging 0.5 with a standard deviation of 0.2. The boolean 



Option nodbgcomp. 
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constraint solver features several different kinds of constraints and consequently 
benefits more from the new data structures than the solver for lists (that basically 
allows for equality between concatenations of lists) . 

Most problems are well-known from the literature: The Deussen problem 
ulm027rl was originally provided by Mark Wallace, Schur’s lemma and Boolean 
n-queens by Daniel Diaz. The final one is a puzzle of unknown origin posted 
by Bart Demoen in the newsgroup comp.lang.prolog. The word problem was 
provided by Klaus Schulz. 

6 Conclusions 

With the CHR system outlined in this paper we aimed at improvements in terms 
of completeness, flexibility and efficiency. 

With regard to completeness some former limitations were removed: 

— The number of heads in a rule is no longer limited to two. The restriction 
was motivated originally by efficiency considerations since more heads need 
more search time. One can encode rules with more than two heads using 
additional auxiliary intermediate constraints. But then, the resulting rules 
are not only hard to understand, they are also less efficient than a true multi- 
headed implementation. In addition, rules apply now in textual order, which 
gives the programmer more control. 

— Guards now support Ask and Tell [28]. In this way, CHR can also be used 
as a general-purpose concurrent constraint language. (In this paper we only 
considered Ask parts of guards.) 

~ Due to space limitations we also have not discussed options and pragmas 
in this paper - these are annotations to programs, rules or constraints that 
enable the compiler to perform powerful optimizations, that can sometimes 
make programs terminate or reduce their complexity class. 

The gain in flexibility of the implementation proper can be attributed to the 
following facts: 

— The CHR compiler has been “orthogonalized” by introducing three clearly 
defined compilation phases. Compilation is now on-the-fiy, while loading. The 
template-based translation with subsequent macro-based partial evaluation 
allows for easy experimentation with different translation schemata. It cre- 
ated the elbow room for a rather quick implementation of various compiler 
options and pragmas. The system was implemented in four man-months. 
The compiler is 1100 lines of Prolog, the runtime system around 600, which 
together is less than half of the ECL*PS® implementation. 

— CHR specific demands, such as access paths and suspension recycling, are 
taken care of explicitly through customized versions of the suspension mech- 
anism. 

— Attributed variables let us efficiently implement the generalized suspension 
mechanism needed for CHR at the source level. In particular, constant time 
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access to constraints of one type can now be provided, instead of the linear 
time access in previous implementations. 

Plans for the future development of the CHR implementation are the intro- 
duction of a priority scheme, realized through a scheduler [33] that makes the 
order in which simultaneously applicable rules are executed explicit, and the 
factorization of common matching instructions [7]. 

More information about CHR is available at the CHR homepage 
http : //www. inf ormatik.uni-muenchen. de/ fruehwir/chr-intro .html 
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Abstract. Many problems from artificial intelligence can be described 
as constraint satisfaction problems over finite domains (CSP(FD)), that 
is, a solution is an assignment of a value from a finite domain to each 
problem variable such that a set of constraints is satisfied. Arc-consisten- 
cy algorithms remove inconsistent values from the set of values that can 
be assigned to a variable (its domain), thus reducing the search space. We 
have developed two parallelisation models of arc-consistency to be run 
on MIMD multiprocessors. Two different policies, static and dynamic, 
to schedule the execution of constraints have been tested. In the static 
scheduling policy, the set of constraints is divided into N partitions, 
which are executed in parallel on N processors. We discuss an important 
factor affecting performance, the criterion to establish the partition in or- 
der to balance the run-time workload. In the dynamic scheduling policy, 
any processor can execute any constraint, improving the workload bal- 
ance. However, a coordination mechanism is required to ensure a sound 
order in the execution of constraints. Both parallelisation models have 
been implemented on a CRAY T3E multiprocessor with up to thirty four 
processors. Empirical results on speedup and behaviour of both models 
are reported and discussed. 



1 Introduction 

Constraint Programming over finite domains (CP(FD)) [.5,7] has been used for 
specifying and solving complex constraint satisfaction and optimisation prob- 
lems, as resource allocation, scheduling and hardware design [6,17]. Finite do- 
main Constraint Satisfaction Problems (CSP) usually describe NP-complete 
search problems, but it has been shown that by working locally on constraints 
and their related variables it is possible to dynamically prune the search space in 
an efficient way. Techniques following this approach, called arc-consistency algo- 
rithms, eliminate inconsistent values from the solution space. They can be used 
to reduce the size of the search space both before and while searching. Waltz 
[18] proposed the first arc-consistency algorithm, and several improved versions 
are described in the literature: AC-3 [10], AC-4 [11], AC-5 [15], and AC-6 [1]. 

AC-3, AC-4 and AC-6 deal with extensional constraints, that is, constraints 
are expressed as the set of tuples that satisfies it, whereas AC-5 can be specialised 

* Supported by project TIC98-0445-C03-02. 

G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 134-151, 1999. 
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for functional, anti-functional and monotonic constraints. This specialisation 
provides an efficient decision procedure for the basic constraints of constraint 
programming languages. 

We have developed and tested two parallelisation models of arc-consistency 
for MIMD distributed shared memory multiprocessors. These models arise from 
two policies of scheduling the constraints to be processed, static and dynamic. In 
the static model, the set of constraints is partitioned into N partitions, which are 
processed in parallel on N processors. We discuss the two main issues affecting 
the performance of this model: the criterion to distribute constraints among pro- 
cessors, and the frequency of updating shared variables. In the dynamic model 
any processor can process any constraint, improving the workload balance. How- 
ever, a coordination mechanism is required to ensure a sound processing order 
of constraints. 

Several parallel processing methods for solving CSPs have been proposed. In 
[20], a parallel constraint solving technique for a special class of CSP, acyclic 
constraint networks, is developed. It also presents some results on parallel com- 
plexity, generalising results in [8] . In [9] , it is concluded that parallel complexity 
of constraint networks is critically dependent on subtle properties of the network 
which do not influence its sequential complexity. They propose massively parallel 
processing of arc-consistency with also very simple processing elements. 

In [2,12] Nguyen, Deville and Baudot proposed distributed versions for AC-3, 
AC-4, and AC-6 for binary CSPs, based on a static scheduling. Our work con- 
siders both static and dynamic scheduling policies, and it is focused on the AC-5 
specialisation for functional, anti-functional and monotonic n-ary constraints. 
More precisely, it is a parallelisation of the indexical scheme [4,3,16]. We have 
integrated the parallel execution of arc-consistency within a labelling process 
that searches for solutions to the constraint satisfaction problem, embedded in a 
constraint logic programming language. Labelling is performed sequentially, that 
is, parallel arc-consistency phases are interleaved with variable- value assignment 
phases, synchronous and identically performed by every processing element, in 
contrast with other distributed constraint satisfaction techniques as [19]. 

The rest of the paper is organised as follows. Next section describes basic 
concepts of constraint programming over finite domains of integers. Section 3 
discusses the parallelism presented by the arc-consistency algorithm and intro- 
duces two models to exploit it. Section 4 describes the static scheduling execution 
model, whereas Section 5 is devoted to the dynamic one. Section 6 reports and 
discusses the experimental results. Finally, conclusions are drawn in section 7. 

2 Constraint Programming 

A constraint satisfaction problem over finite domains may be stated as follows. 
Given a tuple {V,T>,C), where 

— V = {ui, • • • , u„}, is a set of domain variables, 

— T> = {di, • • • , d„}, is the set of an initial finite domain (finite set of values) 

for each variable. 
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— C = {ci, • • • , Cm}, is a set of constraints among the variables in V. A con- 
straint c = {Vc, Re) is defined by a subset of variables 14 C V, and a subset of 
allowed tuples of values Rc C di, where denotes Cartesian 

product . 

The goal is to find an assignment for each variable G V of a value from 
each di G T> which satisfies every constraint Ci G C. 

A constraint c = (Vc,Rc) G C, V} = {vi,---,Vk}, is arc-consistent with 
respect to domains {di,- • • ,dfc| iff for all Vi G V}, for all a G there exists 
a tuple {bi, - ■ ■ , bi_i, a, &i+i, ■ ■ ■ ,bk) G Rc, where bj G dj. A CSP is called arc- 
consistent iff all Ci G C are arc-consistent with respect to T>. 

The starting point of this work is a sequential constraint solver which im- 
plements consistency using the indexical scheme [4,3,16]. In this scheme, a con- 
straint is translated into a set of reactive functional expressions, called indexicals, 
which maintain consistency. An indexical has the form “u in E{Vy\ where v gV, 
V C V, and E{V) is a monotonic functional expression which returns a finite set 
of values. Given an indexical / = z; in E(V), we call V its set of arguments, and 
we say that, for all Vi G V, I depends on Vi, and I writes the domain variable v. 
A constraint c = (Vc, Rc) relating the set of domain variables Vc = {zzi, • • • , Vk}, 
is translated into a set of k indexicals {A = Vi in Ei(Vc — {r’i})}. Each indexical 
li writes variable Vi and depends on the remaining k — 1 variables. Functional 
expressions Ei(Vc~ {vi}) are properly defined for arc-consistency to be achieved 
(removal of inconsistent values) with respect to constraint c. Most common high 
level constraints, such as arithmetic, symbolic and relational ones can be easily 
translated to indexicals. 

The set of finite domains that keeps the current domain of each variable in V 
is called the store. The initial value of the store is defined by V. The execution 
of an indexical v in E(V), is triggered by changes in the domains of its set of 
arguments E in a data driven way. When an indexical is executed, the domain 
of V in the store is updated with n Eval(E(V)), where dy denotes the current 
value of the domain of v in the store, and Eval(E(V)) denotes the evaluation of 
E(V) with the current domains of the set of variables V in the store. 

Figures 1 and 2 show the sequential arc-consistency algorithm. Its input 
argument is the CSP {V,T>,C) whose arc-consistency is to be achieved. The 
set of constraints C is expressed as a set of indexicals. The algorithm returns 
either a store where the domain for each variable has been pruned achieving arc- 
consistency, or FAILURE if inconsistency is detected (the domain of a variable 
was pruned to an empty domain). 

A sequential arc-consistency algorithm executes indexicals until either the 
fixed point is reached, or inconsistency is detected. The fixed point is reached iff 
the store is arc-consistent. A propagation queue is used to schedule the execution 
of indexicals (PropagationQueue, figure I). As the result of the execution of an 
indexical (Arc_Consistent () ), the domain of a variable may be pruned, and 
in such a case the variable is queued (Update ()). Initially, all indexicals are 
executed, initialising the PropagationQueue (line 1). The main loop (lines 2 
to 9) iterates until either the propagation queue is empty, or inconsistency is 
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function Arc-Consistent-CSF ({Var Set, DomSet, Constr Set)) : Store 
begin 

1 Queue_Init (PropagationQueue , ); 

2 while NOT Empty (PropagationQueue) do 

3 Queue_Pop (PropagationQueue, Vi); 

4 for each indexical Ij which depends on Vi do 

5 if NOT Arc_Consistent(7j .Store, PropagationQueue) then 

6 return FAILURE; 

7 end-if ; 

8 end-for; 

9 end-while; 

10 return Store; 

end; 



Fig. 1. Arc consistency algorithm. 

detected. In each iteration, a variable is dequeued and those indexicals that 
depend on it are executed. 



function Arc_Consistent ( ‘vi in E() ’ , 

Var Store, Var PropQueue ) : Boolean 

begin 

NewDomain := Eval(E(), Store); 

return (Update(NewDomain,Ui .Store, PropQueue) <> EMPTY); 
end; 

function UpdateC NewDomain, Vi , Var Store, 

Var PropQueue) : RESULT 

begin 

NewDomain := NewDomain n Store [wi]; 
if Empty (NewDomain) then return EMPTY; end-if; 
if (NewDomain C Store [wi]) then 
Store [ui] := NewDomain; 

Queue_Push(ui , PropQueue) ; 
return PRUNED; 
end-if ; 

return N0T_PRUNED; 
end; 



Fig. 2. Store and propagation queue updating. 

Termination, correctness, complexity, and properties of the algorithm have 
been studied extensively in the literature [15,14,3]. Correctness is independent 
of the order of reexecution of indexicals, which constitutes the basis for the 
correctness of the parallel version of the algorithm. 
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3 Parallel Arc-Consistency 

The arc-consistency algorithm presents inherent parallelism. Each indexical be- 
haves as a concurrent process which updates the store, triggered by changes in 
the store. There is an inherent sequentiality, as well, since an indexical must be 
executed only as the consequence of a previous execution of another indexical. 
This sequentiality defines a partial order among (re)execution of indexicals. An 
indexical is ready if any of its arguments has changed after its last execution. 
At any time during the execution of the arc-consistency algorithm there will 
be a set of ready indexicals, called the ready set. In a sequential version of a 
consistency algorithm the ready set is stored in a propagation queue (updated 
whenever a variable is modified), ensuring a sound execution order of indexi- 
cals, that is, that an indexical is executed after the pruned variable has been 
updated. Parallel consistency algorithms simultaneously execute the indexicals 
in the ready set, providing mechanisms to maintain a sound order. 

We have investigated the feasibility of both static and dynamic scheduling 
policies for execution of indexicals. 

In the static scheduling model, the set of indexicals is divided into N parti- 
tions, which are executed in parallel on N processors. A static scheduling ensures 
a sound execution order of indexicals, since the parallel algorithm is basically 
the sequential one, but applied to a subset of the indexicals. The only coordina- 
tion mechanism needed by this model comes from the detection of termination, 
which can be carried out by one of the processors, called the distinguished one. 
The mapping of indexicals to processors is generated previously to the execution 
of arc-consistency. An important factor for the efficiency of this model is the 
criterion for the distribution of indexicals among processors, therefore different 
criteria have been investigated. 

A dynamic scheduling policy requires a coordination mechanism to guarantee 
a sound execution order. Section 5 discusses the dynamic scheduling model where 
a sound execution order is achieved by means of synchronisation points. 

Parallelisation of the consistency algorithm requires every processor to have 
access to a common store. Since the considered parallelisation models are focused 
on distributed shared memory architecture, each processor has a (partial) local 
copy of the store. Changes in the variables’ domains must be communicated to 
concerned processors in order to maintain coherency among local copies of the 
store. 



4 Static Scheduling of Indexicals 

The set of indexicals C is partitioned into n disjoint subsets, C = Ci U • • • U C„. 
This partitioning induces a distribution of the set of domain variables V in n 
not necessarily disjoint subsets Pi, • • • , 14, (V = Pi U • • • U Vn). For all indexicals 
Ij S Ci, the variable written by Ij, and those variables on which Ij depends on, 
constitute p (V/j S Ci,Ij = u in E{Vj),Vi = {u} U Vj.) Figure 3 sketches the 
partitioning process of the CSP. 
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Partitions {Vi,Di,Ci) are mapped one-to-one to processing elements Pi. Each 
processing element Pi performs sequential arc-consistency, executing those in- 
dexicals in Ci, and consequently updating local copies of variables in Vi. Since 
the distribution of the set of variables V is non-disjoint, some variables will be lo- 
cated at several processing elements. Therefore, each processing element Pi must 
broadcast the pruning of the domain of variable v to every processing element 
Pj which had been assigned any of those indexicals which depend on v. Upon 
receiving the notification, processing elements Pj intersect their local copies of 
the domain with the incoming domain, probably triggering further propagation. 
Communication among processors is also needed in order to detect termination 
of the algorithm, either because of reaching the global fixed point, or because of 
inconsistency detection. 




Fig. 3. Partitioning the CSP. Sub-CSP (Vi,Di,Ci) is assigned to processing 
element (PE) Pi. An edge between two PE’s is labelled with the set of variables 
located at both PE’s {Vi n Vj). Communication is needed to maintain the same 
domain for some of the variables in ViCiVj. 



4.1 Parallel Algorithm 

Figures 4 and 5 show the parallel execution algorithm. As in the sequential 
one, initially every indexical assigned to the processor is executed, initialising 
the local propagation queue (line 3). The main loop (lines 4 to 23) is executed 
until either global fixed point (GlobalFixedPoint) or inconsistency (Failure) 
is detected. The latter can be caused either by: 

— an empty domain results from the execution of a local indexical 
(Local_Arc_Consistent () ). 

— an empty domain results from the intersection of the local domain of a 
variable with the domain received from another processor 
(Remote_Arc_Consistent () ). 

— inconsistency is detected at (and broadcasted from) another processor 
(RemoteFailure). 
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Each processor maintains a private propagation queue (LocalPropQueue). 
The inner loop (lines 6 to 14) performs local propagation until either the queue is 
empty or inconsistency is detected, like the main loop of the sequential algorithm. 
Once a local fixed point is reached, the processor notifies it to the distinguished 
processor of this status (Notify_Local_Fixed_Point ()), and it waits (lines 17 
to 21) until either: 

— global fixed point is detected (Check_Global_Fixed_Point () ). 

— some other processor communicates inconsistency (RemoteFailure). 

— the processor receives a message which updates its local propagation queue. 
In this case, the processor notifies it (Notify_Active()) to the distinguished 
one and continues executing indexicals. 



function Parallel-Consistency ( 

{VarSubSet, DomSubSetyConstrSubSet) ) : Store 

begin 

1 Parallel_State_Reset () ; 

2 Synchronisation; 

3 Queue_Init (LocalPropQueue , ); 

4 while NOT Failure AND NOT GlobalFixedPoint do 

5 Notifyjictive 0 ; 

6 while NOT Failure AND NOT Empty (LocalPropQueue) ) do 

7 Queue_Pop (LocalPropQueue, Vi) ; 

8 for each indexical li which depends on Vi do 

9 Failure := RemoteFailure OR 

10 NOT Local_Arc_Consistent(Ji , Store, LocalPropQueue) OR 

11 NOT Remote_Arc_Consistent (Store , LocalPropQueue) ; 

12 if Failure then break; end-if; 

13 end-for; 

14 end-while ; 

15 if NOT Failure then 

16 Notify_Local_Fixed_Point(- • •) ; 

17 repeat 

18 Failure := RemoteFailure OR 

19 NOT Consistency_Msg(Store, LocalPropQueue) ; 

20 GlobalFixedPoint := Check_Global_Fixed_Point () ; 

21 until Failure OR GlobalFixedPoint OR Message_Received() ; 

22 end-if ; 

23 end-while ; 

24 if Failure then 

25 SynchronisationO ; return FAILURE; 

26 end-if 

27 return Store; 
end-function; 



Fig. 4. Static Parallel Consistency Algorithm. 
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When the local execution of an indexical (Local_Arc_Consistent () , figure 
5) results in the modification of the domain of a variable v (UpdateO, figure 5), 
the processor broadcasts a message (BroadcastJJpdate () , line 7) to the set of 
processors that have been assigned any of those indexicals which depends on vari- 
able V. Upon receiving the message (Remote_Arc_Consistent () , figure 5), these 
processors either detect inconsistency or properly update their local propagation 
queue and their local copy of variable v. Whenever a processor detects inconsis- 
tency, it broadcasts the failure to the rest of processors (Broadcast_Failure()). 



function Local. Arc_Consistent ( ‘vi in E() ’ , 

Var Store, Var PropQueue ): Boolean 

begin 

1 NewDomain := Eval(E(), Store); 

2 switch (Update (NewDomain, Vi , Store, PropQueue)) 

3 case EMPTY : 

4 Broadcast_Failure(RemoteFailure) ; 

5 return FALSE; 

6 case PRUNED : 

7 Broadcast.Update (ui , Store [ui]); 

8 end-switch; 

9 return TRUE; 
end-function; 

function Remote_Arc_Consistent ( Var Store, 

Var PropQueue ) : Boolean 

begin 

1 while NOT Empty (MsgQueue) do 

2 Popjlessage (MsgQueue , Vi, NewDomain); 

3 if (Update(NewDomain,Ui , Store, PropQueue) = EMPTY) then 

4 Broadcast_Failure(RemoteFailure) ; 

5 return FALSE; 

6 end-if ; 

7 end-while; 

8 return TRUE; 
end-function; 



Fig. 5. Parallel consistency functions. 

The algorithm terminates when every processor reaches a local fixed point 
and there are no pending messages. The distinguished processor is the only one 
responsible for the detection of termination. However, it performs local prop- 
agation as any other processor. In order to be able to detect the global fixed 
point, processors must notify to the distinguished one whenever they reach a lo- 
cal fixed point -along with the number of messages they have sent and received- 
(NotifyT.ocaUixedT’ointO), and whenever they leave it due to an incom- 
ing message (NotifyJictiveO). The distinguished processor keeps record of 
which processors are at a local fixed point, and the number of messages sent 
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and received by all processors. When termination is detected, the distinguished 
processor notifies it to the rest of processors (GlobalFixedPoint). 

Since this parallel algorithm is part of a labelling procedure where variable- 
value assignment is performed synchronously, a synchronisation point among all 
processing elements is needed at the beginning of the algorithm, just after the 
initialisation of the communication status variables (Parallel_State_Reset () , 
line 2, figure 4). Another synchronisation (line 25) is needed if the algorithm 
finishes with failure; otherwise, the global fixed point detection implies a syn- 
chronisation among processors. Synchronisation points guarantee that every pro- 
cessing element waits until the last processing element has finished the current 
arc-consistency cycle before it starts working on the next one. 



4.2 Partition of the CSP 

The way the set of indexicals is partitioned has shown to be an essential factor 
for the efficiency of the parallel algorithm. A CSP {V,'D,C) can be represented 
as a hyper-graph where the set of nodes is the set of domain variables V and the 
set of hyper-edges is the set of indexicals defined by C. Therefore, partitioning 
the CSP among processors means partitioning the set of hyper-edges in disjoints 
subsets, inducing a not necessarily disjoint partitioning of the set of nodes. We 
have tested two different graph partition criteria: 



— Strength of connection among partitions. 

— Static estimation of run-time ready set distribution. 



Strength of connection among partitions The graph topology can be con- 
sidered in order to partition the graph in strongly connected subgraphs, or highly 
disconneeted subgraphs. 

In the former case, communications are minimised, but the ready set will be 
badly balanced, in general. A strongly connected partitioning induces an almost 
disjoint partitioning of the set of variables V, thus avoiding communications. 
However, it is very likely that most of those indexicals which depend on a variable 
V are assigned to the same processing element P. Whenever variable v is pruned, 
the ready set is enlarged with those indexicals which depend on v, but almost all 
of them will be sequentially executed by P, thus loosing the potential parallelism 
exploitation. 

In the latter case, the ready set is better balanced, but it is likely that almost 
every variable will be located at almost every processing element, increasing 
communications. 

Experimental results show the benefit of a better balanced ready set ver- 
sus a communications reduction. Moreover, partitioning the CSP in strongly 
connected subgraphs is a hard problem, whereas a highly disconnected CSP 
partitioning is easily achieved with a shuffle distribution of indexicals. 
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Static estimation of run-time ready set distribution A partition of the 
set of indexicals that balances the run-time ready set is expected to improve the 
performance, providing that communications do not increase. Since this model is 
based on a static partitioning of the CSP among processors, balancing run-time 
ready set requires some kind of compile-time static estimation. 

The idea is to partition the set of indexicals in such a way that updating any 
variable causes a similar number of indexicals to be executed by each processor 
[13]. We have defined an objective function to be minimised, which considers the 
peak workload for each processor and variable. Experimental measures of run- 
time workload have confirmed the accuracy of our static estimation. Since we 
are dealing with n-ary constraints, finding the optimal solution is a NP problem. 
Therefore, we recourse to an algorithm which assigns indexicals one by one, in 
a decreasing arity order, greedily choosing the processor which minimises the 
objective function. Solutions found with this greedy algorithm have shown to be 
quite close to the optimal one when the CSP is constituted by a large number of 
low-arity constraints. Taking into account that this is just an estimation of the 
actual run-time ready set distribution, the greedy approach is fully justified. 

5 Dynamic Scheduling of Indexicals 

A dynamic scheduling policy dispatches the ready set of indexicals every execu- 
tion cycle, in order to balance workload. However, these models require mecha- 
nisms to ensure that the indexicals depending on a variable are executed after 
the change in the domain of the variable have been updated in the store of the 
processor executing the indexical. The alternatives to achieve a sound execu- 
tion order are either to introduce synchronisation points during the execution 
(distributed control) or to include a master processor (centralised control) to 
perform the dispatching of indexicals. The latter model leads to tasks of small 
granularity, inappropriate for a distributed memory architecture. Therefore, we 
concentrate on the distributed control alternative. 

The dynamic parallelisation model is based on dividing the execution in 
synchronised execution cycles. An execution cycle consists of generating of the 
ready set, distributed selection and execution of the ready set, and a synchroni- 
sation point. In order to distribute the queued indexicals, every processor must 
generate identical propagation queues of indexicals. In this way, each processor 
independently selects and executes, according to a fixed rule, a different subset of 
indexicals of those present in the propagation queue. Synchronisation points be- 
tween execution cycles are introduced in order to generate identical propagation 
queues. Besides, the store must be replicated in every processor. 

The consistency algorithm for this model initially queues every indexical. 
Then, execution cycles are performed until either there are no indexicals to 
execute or inconsistency is detected. An execution cycle comprises the following 
actions: 

— Each indexical in the queue is executed by a particular processor, until the 

queue is empty. The coordination criterion ensures that every queued index- 
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ical is selected by only one processor, and that the workload is well balanced 
in each execution cycle. 

— Modified variables are recorded and broadcasted, but indexicals are not 
queued in the current propagation queue. 

— Changes in domains of variables received from remote processors are updated 
and queued. 

— Once the propagation queue is empty, and after a synchronisation, which en- 
sures that all processors have the same value of the domains of the variables, 
a new propagation queue is generated, queuing all indexicals depending on 
modified variables. 

Two criteria to select indexicals from the queue have been investigated in 
order to tune the model: 

— Assigning the same number of indexicals to each processor. This criterion 
can lead to unbalanced workload since each indexical involves a different 
amount of work. 

— Dynamic distribution, in which each processor selects, in mutual exclusion, 
the next pending indexical. 

First criterion has yielded better results, showing that workload balance is good 
enough, while second criterion increases communication overhead. 



6 Experimental Results 

The presented parallel algorithms have been written in C, and developed and 
tested on a CRAY T3E multiprocessor with thirty four 400-MHz DEC Alpha 
processors, 128 Mb of memory per processor, under UNICOS (UNIX) operating 
system. Notification of failure, global and local fixed point detection, activity 
status, and number of messages sent and received, have been implemented using 
the remote memory write feature of the CRAY T3E multiprocessor. Queues of 
messages are used for receiving domain updates. Messages are broadcasted to 
queues also using the fast remote memory write feature. 

Reported results correspond to the time required to reach the first or all so- 
lutions, depending on the benchmark, performing a first fail sequential labelling. 
Therefore, reported speedup is lower than speedup achieved in a single call to 
the arc-consistency algorithm, since the search for a solution usually comprises 
a large number of calls to the arc-consistency algorithm, executed in parallel, 
interleaved with the selection and assignment of a value to a variable, executed 
sequentially. 



6.1 Benchmarks 

We have tested the parallelisation models on a set of benchmarks: 
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1. Arithmetic is a synthetic benchmark. It is formed by sixteen blocks of arith- 

metic relations, {Bi, • • • , Bie}. Each block contains fifteen equations and 
inequations among six variables. Blocks are connected by an ad- 

ditional equation between a pair of variables, one from Bi and the other 
one from Coefficients were randomly generated. The goal is to find an 
integer solution vector. 

2. Suudoku is a crypto-arithmetic Japanese problem. Given a grid of 25x25 
squares, where 317 of them are filled with a number between 1 and 25, 
fill the rest of squares such that each row and column is a permutation of 
numbers 1 to 25. Furthermore, each of the twenty- five 5x5 squares starting 
in columns (rows) 1, 6, 11, 16, 21 must also be a permutation of numbers 1 
to 25. 

3. N- Queens problem consists in placing N queens in an NxN chess board 
in such a way that no queen attacks each other. The instance presented 
corresponds to N = 111, size which leads to a significant execution time. 

4. Parametrizable Binary Constraint Satisfaction Problem (PBCSP). Synthetic 
PBCSPs allow studying the performance of arc-consistency algorithms as 
some significant problem parameters vary. Instances of this problem are ran- 
domly generated given four parameters: number of variables, the size of the 
initial domains, density, and tightness. All constraints are binary, that is, 
they involve only two variables. A constraint is defined as the set of pairs of 
values that satisfies it. Density and tightness are defined as follows: 

_ . nc . np 

Density = Tightness = 1 

nv — 1 ds^ 

where nv is the number of variables, nc is the number of constraints involving 
one variable (it is the same for all variables), np is the number of pairs that 
satisfies the constraint, and ds is the size of the initial domains. Figure 6 
reports results obtained for an instance of this problem where nv = 100, 
ds = 20, Density = 0.75, and Tightness = 0.85. 



Table 1. Benchmarks characteristics. 





Arithmetic 


Suudoku 


N- Queens 


PBCSP 


Search for 


first sol. 


first sol. 


first sol. 


all sol. 


No. of Variables 


126 


308 


Ill 


100 


No. of Constraints 


254 


13,942 


6,105 


3,713 


No. of Indexicals 


1,468 


27,884 


12,210 


7,426 


No. of Calls to Consistency 


15,969 


72,196 


8,660 


65 


Seq. Exec. Time (s.) 


15.05 


132.98 


12.62 


5.25 


No. of ind. executed 


1,953,660 


9,764,960 


246,262 


318,552 


Avg. time per call (ms.) 


0.9 


1.8 


1.5 


80.8 



Table 1 summarises relevant data about the four benchmarks. The three first 
benchmarks are executed searching for the first solution, whereas the fourth one 
keeps searching until all solutions (40) are found. The table shows the number of 
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variables, number of constrains and number of indexicals for all benchmarks, as 
well as the number of calls to the arc-consistency algorithm. It also reports the 
sequential execution time, and the total number of indexicals executed in the 
sequential version. Finally, the table reports the average execution time per call 
to the arc-consistency algorithm, which indicates the granularity of the process 
to be parallelised. 

6.2 Speedup 



Arithmetic 



Suudoku 







Fig. 6. Speedup curves for selected benchmarks. 



Charts in figure 6 show, for each benchmark, the speedup vs. the number 
of processors. For the static scheduling policy the ready set balance estimation 
was used, comparing broadcast frequency: immediate (solid line) vs. fixed point 
(dotted line). Chart for the Arithmetic benchmark also shows the speedup ob- 
tained with a strongly connected graph partitioning (dashed line) . This criterion 



Parallel Execution Models for Constraint Programming 



147 



has not been considered for the other benchmark, since it clearly provides worse 
results than ready set balance, and because it is too computationally expensive 
to apply. The speedup obtained with the dynamic scheduling policy (dot-dashed 
line) is worse than the best one of the static policy, mainly because the overhead 
due to the synchronisation points introduced in the dynamic model is too large 
versus the granularity of indexicals in the considered problems. However, this 
model could be more efficient using large granularity indexicals or propagators, 
as those arising from global constraints. 

It can also be observed that whereas the PBCSP problem presents a nearly 
linear speedup for the best static scheduling policy, the speedup for the rest 
of benchmarks stops increasing from a certain number of processors. The main 
factor for this different behaviour is that in the PBCSP benchmark calls to the 
arc-consistency algorithm have a larger execution time, and indexicals executions 
have larger granularity (see Table 1). Besides, PBCSP has a constraint graph 
with a more uniform topology, leading to a better workload balance. In order 
to study this factor we have measured the workload distribution among the 
processing elements. 

Arithmetic PBCSP 





Fig. 7. Average, minimum and maximum number of executions of indexicals per 
processor. 



Figure 7 shows the average, the minimum, and the maximum number of in- 
dexicals executed per processor, for the dynamic scheduling policy and the static 
scheduling policy with immediate broadcast. The difference between minimum 
and maximum indicates workload balance quality. It can be observed that, in the 
Arithmetic problem, the larger number of processors, the worse workload bal- 
ance is. This fact limits the performance, since the execution time corresponds 
to the slower processor, because of serialisation between consecutive call to arc- 
consistency. The dynamic scheduling policy exhibits a better workload balance. 
Nevertheless, the speedup for this model is limited by the need of synchronisa- 
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tion points. For the PBCSP benchmark, the minimum and maximum curves do 
not differ in the static neither in the dynamic policy, indicating a high quality 
workload balance. Nevertheless, it can be expected that PBCSP benchmark will 
also reach a saturation point for a larger number of processors. 



6.3 Scaleup 

It is important to know how the performance of the parallel system depends 
on the characteristics of the problem. The PBCSP benchmark, as a generic 
parametrizable constraint satisfaction problem, offers the opportunity to study 
what characteristics are desirable in a problem in order to achieve a high per- 
formance when executed in parallel. 



Table 2. Benchmarks characteristics of figure 8(a), scaleup vs. number of vari- 
ables. 



No. of variables 


25 


50 


100 


200 


No. of Constraints 


228 


910 


3,713 


14,932 


No. of Indexicals 


456 


1,820 


7,426 


29,864 


No. of Calls to Consistency 


63 


61 


65 


66 


Seq. Exec. Time (s.) 


0.29 


1.19 


5.25 


22.46 


No. of ind. exec. 


18,192 


71,218 


318,552 


1,311,061 


Avg. time per call (ms.) 


4.6 


19.5 


80.8 


340.3 



Table 3. Benchmarks characteristics of figure 8(b), scaleup vs. density. 



Density 


0.25 


0.50 


0.75 


1.00 


No. of Constraints 


1,216 


2,464 


3,713 


4,950 


No. of Indexicals 


2,432 


4,928 


7,426 


9,900 


No. of Calls to Consistency 


64 


64 


65 


62 


Seq. Exec. Time (s.) 


1.78 


3.40 


5.25 


6.59 


No. of ind. exec. 


105,542 


205,400 


318,552 


400,430 


Avg. time per call (ms.) 


27.8 


53.1 


80.8 


106.3 



The size of a PBCSP mainly depends on the number of variables and the 
density of the constraint graph. Figure 8(a) shows the speedup versus the num- 
ber of processors, for four different numbers of variables, fixing density to 0.75, 
tightness to 0.85, and domain size to 20. Figure 8(b) shows the speedup versus 
the number of processors, for different densities, fixing the number of variables 
to 100, tightness to 0.85, and domain size to 20. Tables 2 and 3 summarises 
relevant data about the problem instances used to plot the curves. All instances 
were run searching for all solutions (40). Both charts indicate that the larger the 
problem is, the higher speedup is obtained. This fact indicates the suitability of 
the system for large problems, provided a uniform constraint graph, which is a 
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much desirable property in order to solve the real scale combinatorial problems 
which constraint programming aims to tackle. 



(a) Number of Variables (b) Density 



25 variables 






50 variables 




Density = 0.25 


100 variables 




Density *0.50 


200 variables 




Density = 0.75 






Density *1.00 , 










g. 






CO 






10 
















■■ ^ " 








5 




p ’ 




■ 





















0 I I 0 I 

1 6 11 16 21 26 31 1 6 11 16 21 26 31 

Number of processors Number of processors 



Fig. 8. Scaleup with the number of variables and the density of the problem. 



7 Conclusions 

We have developed and evaluated parallelisation models of an arc-consistency 
algorithm for constraint satisfaction problems over finite domains. These models 
have been implemented on a CRAY T3E, a distributed shared memory MIMD 
multiprocessor, and empirical data is reported for several benchmarks. 

Two different techniques for scheduling the execution of constraints, dynamic 
and static, have been tested. The dynamic model has shown poor speedups, 
particularly when compared with those obtained with the static model, therefore 
we have focused our work on the static scheduling policy. 

A number of topics affecting performance have been investigated in order to 
tune the static scheduling model. The way constraints are distributed among pro- 
cessors, and the frequency of updating shared variables, are determining factors 
for the performance of the model. The study of the distribution of constraints 
among processors has shown that a strongly connected partitioning (high num- 
ber of shared variables) is worse than a partition based on an estimation of the 
run-time workload balance. Tests on broadcast frequency revealed the conve- 
nience of an immediate broadcast. 

The speedup obtained is nearly linear for PBCSP benchmark, whereas for the 
rest of them it stops increasing from a problem dependent number of processors. 
This difference is mainly due to the more uniform constraint graph and larger 
granularity of the PBCSP benchmark, which leads to a better workload balance. 
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Anyway, the PBCSP benchmark would also reach a saturation point for a larger 
number of processors. 

In order to study how the performance of the parallel system depends on the 
characteristics of the constraint satisfaction problem to solve, the parametrizable 
synthetic benchmark has been tested for different sets of parameters. Results 
show that the system is better suited for large scale problems with a dense 
constraint graph. 
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Abstract. In this paper we propose a lazy functional logic langnage, 
named SETA, which allows to handle multisets, built-in arithmetic con- 
straints over the domain of real numbers, as well as various symbolic 
constraints over datatypes. As main theoretical results, we have proved 
the existence of free term models for all SETA programs and we have 
developed a correct and complete goal solving mechanism. 



1 Introduction 

Functional logic programming (FLP, in short) aims to combine into a single 
paradigm the nicest properties of both functional and logic programming. In 
many approaches to FLP (see [11] as a survey) programs are seen as constructor- 
based conditional rewrite systems. One of such approaches is the CRWL-frame- 
work of [9] where classical equational logic is replaced by a suitable constructor- 
based rewriting logic which expresses properly the behaviour of reduction for 
lazy, partial and possibly non-deterministic functions. 

Most approaches to FLP, including CRWL, are based on free constructors. 
However, for some applications it is more convenient to represent data by means 
of non-free constructors, for which some equational specification is given. An 
extension of [9] along this line has been presented in [4,5], where a general frame- 
work for FLP with polymorphic algebraic types is investigated. One particular 
interesting case is that of multisets, which are known to be useful to model a va- 
riety of scenarios, like the Gamma programming model [6] or action and change 
problems [18]. On the other hand, many problems involve computations over 
specific domains -like real numbers, boolean functions or finite domains- for 
which the constructor-based approach is not adequate at all. A crucial contribu- 
tion to this issue within the field of logic programming has been the constraint 
logic programming paradigm [14] and its different instances, such as the language 
CLP(TZ) [15] which is known to have a wide and growing range of applications. 

Our aim in this work is to merge the expressive power of polymorphic alge- 
braic types and constraints into a single language (named SETA^) which can be 
understood as an extended instance of the framework in [4,5]. It is an instance 

* This research has been partially supported by the Spanish National Project TIC98- 
0445-C03-02 “TREND” and the Esprit BRA Working Group EP-22457 “CCLII” . 

^ SETA is not an acronym, but simply the Spanish word for mushroom. 



G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 152-169, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 




Functional Plus Logic Programming 153 

in the sense that the multiset constructor is the unique non-free constructor 
we consider, but it is extended since it incorporates primitive “built-in” data 
(the real numbers), constraints over them and also “symbolic” constraints over 
constructor terms (in particular, over multisets), including equality, disequality, 
membership and non-membership. Following a line similar to that of [9,4,5], we 
develop proof-theoretic and model-theoretic semantics (including the existence 
of free term models) of SETA programs, and then propose an operational seman- 
tics by means of a sound and complete goal solving calculus which combines lazy 
narrowing, unification modulo the equational axiom of multisets and constraint 
solving. The combination of lazy functions, multisets, arithmetic constraints and 
symbolic constraints is not found in other related declarative languages, as e.g. 
[8,7,13]. Moreover, the range of potential applications of SETA is very large. 
To those which make use in isolation of either multisets or real constraints, we 
must add others which can take profit of the combination of both. One of such 
application fields is the parsing of visual languages [12,17], where symbolic and 
arithmetic constraints can be naturally combined to specify the construction of 
complex graphic figures from (a multiset of) components. 

The rest of the paper is organized as follows. Next section introduces the lan- 
guage SETA , includes a small example illustrating the capabilities of SETA and 
develops a proof theory in the form of a constrained, goal-oriented, constructor- 
based rewriting calculus. Section 3 presents the model theory for SETA programs, 
whereas Section 4 contains a sketch of the goal solving calculus for SETA and 
the corresponding soundness and completeness results. Last section summarizes 
some conclusions. Due to space limitations, proofs and many other technical 
details have been left out, but they can be found in [2]. 

2 The Language SETA 

Due to the fact that language SETA handles real numbers, the presentation of the 
language is based on two levels: The primitive level (containing everything related 
to real numbers) and the symbolic level (containing multisets, free datatypes, and 
constraints over them). 



2.1 The Primitive Level 

Sp denotes the primitive signature defined as the triple {PT , PO, PP), where PT 
has Real as unique Primitive Type, PO = {0,1 > Real,-\-,* : {Real, Real) 

Real} is a set of type declarations for Primitive Operations, and PP = {==,/= 
, <: {Real, Real)} is a set of type declarations for Primitive Predicates. 

Given a denumerable set x,y,. . . G DV of data variables, the set Tp{DV) of 
primitive terms 1^,3^, .. .is built from DV and PO. The set Rp{DV) of primitive 
constraints is defined as ::= True \ | p\/\p^ \ 3xp^, where^ x G DV, 

<> G {==, h, <}, G Tp{DV), pI G Rp{DV), 1 < z < 2. 

Note that the constraints >, < and > may be easily defined from /= and <. 



2 
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In the following, we will use TZ to denote the field of real numbers (R, 0®, 1®, 
-h®, *®). rf will denote a primitive valuation (i.e., a mapping from DV to R), and 
{TZ, rf) \=Ti (p^ will express the validity of the primitive constraint ipf € Rp{DV) 
in TZ under rf . Finally, given a set Fp of primitive constraints and a constraint 
ipP G Rp{DV), the notation Fp \=tz <p^ will mean that <p^ is a logical consequence 
of Fp when both are interpreted over TZ. 



2.2 The Symbolic Level 

Let TV be a countable set of type variables a,/3, ..., and TC = lJn>o ^ 
countable alphabet of type constructors K,K' , . . . including the multiset type 
constructor Mset € TC^. Polymorphic types € Typerpfj[TV) are built 

from TV and TC. The set of type variables occurring in r is written tvar{r). 

We define a polymorphic signature S over TC a,s E = EpU{TC, DC, FS, PS), 
where: 

> DC is a set of type declarations for data constructors, of the form c : (ti, . . . , 
Tn) T with \Ji^^itvar{Ti) C tvar{T) and r ^ TV U PT. We assume 
that DC contains the type declarations {[ ]} :^ Msct{a) (representing the 
empty multiset) and {[•I']} : {a, Mset (a)) Mset{a) (representing the mul- 
tiset constructor^). The multiset constructor is governed by the equation 
(mset) : {[a;, y|a;s ]} « {[i/, x|a;s }. Here, we have used {[x, y|a;s ]} as abbrevia- 
tion for {[a;|{[t/|a;s } ]}. In the sequel we will continue using such notation. 

> FS is a set of type declarations for defined function symbols, of the form 
/ : (ti, . . . ,r„) ^ t. 

> PS = {==,/=: (a,a),G,^: {a, Mset{a))} is a set of type declarations for 
predicate symbols. == and /= stand for strict equality and disequality respec- 
tively, whereas G, ^ represent membership and non-membership respectively. 

We require that DC U FS does not include multiple type declarations for the 
same symbol. We will write h G 14(7" U FS^ to indicate the arity of a symbol 
according to its type declaration. In the following, DC± will denote DC extended 
by a new declaration _L:— > a. The bottom constant constructor _L is intended to 
represent an undefined value. Analogously, E± will denote the result of replacing 
DC by DC± in E. 

Total expressions e, es,r, . . . G Es{DV) are built from DV, PC, DC and FS. 
The set Es^{DV) of partial expressions is defined in the same way, but using 
DC ± in place of DC. Total data terms Ts{DV) C Es{DV) and partial data 
terms Ts^{DV) C Es^{DV) are built by using variables, primitive operations 
and data constructors only. In the sequel, we reserve t, ts, s, I, Is, m, ms, to denote 
possibly partial data terms, and we write dvar{e) for the set of all data variables 
occurring in an expression e. 

The set (p,p>' , . . . €. Rs^{DV) of partial constraints is defined as (p ::= True \ 
ei0e2 \ <pi T\p> 2 \ where 0 G {==, /=, <, G, ^}, x G DV, ei G Es^{DV), 

® The intended meaning of {[a;|a:s} is to add a new copy of the element x to the 
multiset xs. 
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tpi,ip € Rs^{DV), 1 < t < 2. We say that ip G Rs^{DV) is a total constraint if 
_L does not occur in (p. The set of all total constraints is denoted by Rs{DV). 

The rewriting calculus in Subsection 2.3 will deal with an extended notion of 
expressions and constraints, for which all real numbers are available as new 
constants. Formally, the set E*^ {DV) of extended partial expressions is de- 
fined as e ::=T| a; | 0 | 1 | d | ei0e2 | d(ei, . . . , e„), where h G DC^ U FS^, 
Ci G E*^^{DV), l<t<n, dGM, Og {-f, *}, x G DV . If we eliminate T in the 
definition above, we obtain the set E*^{DV) of extended total expressions. Analo- 
gously, by ignoring FS we can define the sets T^^{DV) and T^{DV) of extended 
partial and total data terms, respectively. The sets R\. {DV) and R^{DV) of ex- 
tended partial constraints and extended total constraints are defined similarly to 
the sets Rs^{DV) and Rs{DV) respectively, but now all considered expressions 
must be extended. An extended primitive term T*{DV) will be an extended total 
data term not containing symbols in DC. Similarly, the set R*{DV) C R*^{DV) 
of extended primitive constraints is composed of all those extended total con- 
straints containing only primitive symbols and variables. 

We define possibly partial data substitutions S as mappings from DV to 
T^^{DV). In the rest of the paper, DSub{A), where A is a subset of T^^{DV), 
will denote the set of all data substitutions mapping DV to A. 

An environment is defined as any set V of type-annotated data variables x : r, 
such that V does not include two different annotations for the same variable. 
By considering that any d G R has an associated type declaration d Real, 
it is possible to determine when an extended partial expression e G E‘^^{DV) 
has type r in an environment V. The set E^^{V) (resp. E^ (IG)) of all extended 
partial expressions (resp. extended total expressions) that admit type r w.r.t. 
V is defined in the usual way; see e.g. [4]. Note that E^^{V) has T^^{V) and 
(V) as subsets. We can talk also about the sets E^^{V) and (V) (resp. 
E^{V) and T^{V)) of partial expressions and terms (resp. total expressions and 
terms) which admit type t in V. 

A constraint p G R’^^{DV) is well-typed w.r.t. an environment V iff one of 
the following items hold: 

> = True or (p = Cl < C 2 and e* G (IG), 1 < i < 2. 

\> (p = Ci<)>e 2 , C” G {==)/=} ^nd a G E^^{V), 1 < t < 2, for some r 
G TyperpciTV). Or ip = ei0e2, 0 G {G,^} and ei G E*^^{V), C 2 G 
^ *(^)> fo'' some T G TyperpQ{TV). 

> (p = ipi A p >2 and ipi are well-typed w.r.t. V, 1 < i < 2. 

> (p = 3xtp' and there exists r G TyperpQ{TV) such that ip'[x/y] is well-typed 
w.r.t. V[y : r], where y is a fresh variable and V[y : r] denotes the environ- 
ment resulting of adding to V the type-annotation y : r. 

Assuming a type declaration / : (ti, . . . ,Tn) ^ r G FS , a defining rule for 
/ has the form f{t\, . . . ,tn) r ^ <p, where the left-hand side is linear, ti G 
Ts{DV) does not contain any primitive symbol in PO, 1 < i < n, r G Es{DV), 
(p G Rs{DV) and dvar{r) C dvar{ti). A program rule is well-typed iff there 
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exists an environment V such that ti € T^*(V), 1 < i < n, r € E^(V) and (p is 
well- typed w.r.t. V. 

We define programs as pairs V = where H is a polymorphic signature 

and R is a finite set of defining rules for defined functions symbols in S. We say 
that a program V is well-typed iff all program rules in R are well-typed. Note 
that primitive operations are not allowed in the left-hand sides of program rules. 
This implies no loss of expressiveness, due the availability of equality constraints. 

As argued in [4,5,3], the expressive power of the multiset constructor can be 
used to tackle any kind of problem which is related to the widely applicable idea 
of multiset rewriting; see e.g. [6,18,17]. We present here a small example which 
shows the advantages of combining multisets, real numbers and constraints to 
generate and recognize graphic figures, which is very related to the issue of 
parsing visual languages [12,17]. 

Example 1. Consider the problem of building quadrilaterals from given points 
in the plane, in such a way that the resulting figures have no common vertices. 
Points and quadrilaterals can be respectively represented by means of the data 
constructors: 

P : {Real, Real) Point, Q : {Point, Point, Point, Point, Point) Quadrilateral. 

The intended meaning of two consecutive points P, P' in a term of the form 
Q{P1, P2, P3, P4, PI) is that there exists a line from P to P' . Figures will be 
multisets of quadrilaterals. In order to solve the problem, we define the functions: 
figure : Mset{Point) Mset{Quadrilateral) 
figure{l ]}) ^ {[ ]} 

figure{lpl,p2,p3,p4\ps}) lQ{pl ,p2,p3 ,p4,pl)\figure{ps)} 

■<= pi ^ ps A p2 ^ ps /\ p3 ^ ps A p 4 ^ ps a quadrilateral {pi ,p2,p3,p4) == True 

where the function quadrilateral checks if four points generate a quadrilateral 
by using the following result: “The four midpoints of the lines composing a 
quadrilateral form a parallelogram” . The code for this function is the following: 
quadrilateral : {Point, Point, Point, Point) Bool 
quadrilateral{pl ,p2,p3 ,p4) True 

midpoint {pi, p2, ml) == True A midpoint {p2,p2>, m2) == True A 
midpoint{p3,p4,m3) == True A midpoint {p4, pi, m4) == True A 
parallelogram{ml,m2,m3,m4:) == True 

Of course. True is a boolean constant. Functions midpoint and parallelogram are 
defined as: 

midpoint : {Point, Point, Point) Bool 

midpoint{P{xl,yl), P{x2,y2), P{x3,y3)) True 
■<= 2 * x3 == xl x2 A2 * yS == yl -T y2 

parallelogram : {Point, Point, Point, Point) Bool 
parallelogram {P {xl, y 1), P{x2,y2), P{x3,y 3), P{x4,y 4)) True 

xl — x4 == x2 — x3 A yl — y4 == y2 — y3 

Considering the goal: 

G = figure{i P{-3, 0), P{3, 0), P(4, 3), P(5, -4), 

P(8, 0), P(12, 3), P(14, -2), P(ll, -6) }) == I 
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and using the lazy narrowing calculus presented in Sect. 4, we can obtain various 
computed answers, as e.g. 

I = Q(P(-3, 0), P(4, 3), P(3, 0), P(5, -4), P(-3, 0)), 

Q(P(8, 0), P(12, 3), P(14, -2), P(ll, -6), P(8, 0)) } 



2.3 A Constrained Goal-Oriented Rewriting Calculns 

In the rest of this section we present a Constrained Goal-Oriented constructor- 
based Rewriting Calculus (named CGORC) which is intended as a proof theo- 
retical specification of program’s semantics. 

A constraint ip € R*^^{DV) is in solved form iff (/? G R*{DV) or (p = xj= t 
or ip = s ^ xs, where x,xs G DV , t,s € Tfj^{DV), x ^ t. Given a finite set 
r of constraints in solved form, the rewriting calculus CGORC allows to derive 
statements of the form e ^ t (named approximation statements) and constraints 
ip, where e G E*^JDV), t G Tfj^{DV), ip G R*s^{DV) and P C R*^^{DV). 

The intended meaning of an approximation statement e ^ t is that the 
possibly partial data term t approximates e’s value. As notation, F \--p y, where 
y is either an approximation statement or a constraint, and F C R^^(DV) is 
in solved form, will denote the derivability of y from F in CGORC . 

The constraint symbols == and /= are overloaded and they must be treated 
differently depending on the level they belong to. For this reason, we need the 
notion of the primitive part of a finite set F C R'^^(DV) in solved form. Intu- 
itively, this is the part PP{F) of F depending on those variables forced (by F) 
to take primitive (i.e., numeric) values. Formally, we define the set primvar(F) 
of primitive variables in F as the least set of variables verifying: 

[> If G R*{DV) n F and (p^ yf x/= y, then lib{(p^) C primvar(F), where 
lib{(pP) denotes the set of free variables of (p^. 

> If x/= y € F or yj= x G P and x G primvar(F) then y G primvar(F). 
and we define the primitive part of P as the set 

PP(P) = {ipP I ipP G R;{DV) nF,ipP^ xj= y,x,y€ DV}U 
{xj= y I xj= y G F,x,y G primvar(F)}. 

For instance, if P = {x < 2,xj= y,y/= z,y ^ xs} then primvar(F) = 
{x, y, z}, and PP(P) = {x<2, xj= y, yj= z}. 

Rules of the CGORC Calculus: Now we are ready to define the rewrit- 
ing calculus CGORC. In the following, the notation /i(e„) is a shorthand for 
h{ei, . . . , e„), where h G DC^ U Pb”". The calculus CGORC is composed of the 
following inference rules: 

• CGORC-vvles for 



,PR)i_ ’‘’‘FFC’zr’ (PR)1 



P h-p ei — > P \~V 62 — > tfi C \~V tiOtf — *■ ^3 



FGvC ^ sP FGv 61^62 ^ tg 

dvaritP sP) C primvar{F), (f G {+,*}, ei0e2 G E’^^(DV) — Tf.^{DV). 

F G-p ei ^ ti, . . . , F G-p e„ ^ t„ 



(B) 



P Gp e — >T 



(RR) 



r h-p X ■ 



(DC) 



r "P c(6n) ^ 
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X G DV,x ^ primvar{r) and c € DC"', n>0. 

(OMUT) rhre^C,rhres-^y,\ts},rhrlt,,t,\ts}^ts' 

-T h-p {[e|esj} ^ is 



(OR) 



r h-p ei — hp €n — ^ Sn 5 , F hp vS — > t^r hp ip 5 



r \~V f{e„) t 

t f{sn) ^ r ^ (f G R and <5 € DSuh{T^^{DV)). 

• CGORC-rules for R\,^{DV) 

(PR) ^ R;{DV), Itbi^n Q primvar{r) 

^ 1 l~p (p 

If rule (PR)^ is not applicable, then we can use the following rules, in which 
the superscript S means that a symmetric rule is implicitly assumed. 



(HIP) 

(C) 



•pGR (CONJ) 



r hp (p 

r hp e — > t, P hp r — > t 



r hp y>i, r hp y>2 
r hp y>i A y>2 

t G T*s{DV) 



(EX) 



r hp ip'[x/t] 
r hp 3a;y>' 



(LEQ) 



P hp e == r 
r\-v e^C,r\-vr^ sP, P hp 



(MEMB)^ 

(NMEMB)_ 



r hp e < r 

P hp es ^ {[t|ts]},P hp e e {[t|te]} 



P hp e G es 

P hp es — > ts, P hp e ^ ts 
P hp e G es 



eshPJ, (PP)-Ti, (PP) 



esGE*s(DV)-n(DV) 






(MEMB) 



P hp e G ts 



(NMEMB)j 



P hp e/= t, P hp e G ts 



(NEQ)_ 



^ P hp e G {[t|te ]} ^ ^ r Gv e G ^t\ts1 

Php e — > t, r G-p T — > Sj r Gp tj= s 
r hp e/= r 



Ife^K ]}, r^{[ } and e/= r contains some symbol / G FS, or e (resp. r) has 
the multiset constructor as outermost symbol. 



( Q)i p 



(NEQ)2 

(NEQ)f 



P hp ti/= Si 

r hp c{tn)/= C(S„) 

r Gp t G 'ms 
P hp {[/:|fe ]}/= ms 



cG DC", dG DC"", c^d, c,d^l-\-} 
c G DC", c ^ {[ 'I' ]}i one rule for each 1 < i < n 
/'iviFO'l ^ ^ R'P 

rGp^t\ts}hVM 



Some comments are needed in order to clarify the rules above. Firstly, remark 
that all CGORC-rules associated to — > and == are similar to those presented in 
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[4], except for rules (PR)l^, 1 < i < 2. Rule (PR)i^ refers to approximation state- 
ments between primitive terms. In this case, we remit ourselves to the primitive 
level. On the other hand, (PR)^ deals with approximation statements whose 
left-hand side has -I- or * as the outermost operation, but contains some symbol 
in FS. In this case, it is enough to look for approximations for e^, 1 < i < 2, 
which must be primitive terms, and finally prove the approximation statement 
^ ^3 (this will be done by means of rule (PR)i^). 

With respect to the CGORC-vnles associated to R*^^{DV), note that the 
rules (HIP), (CONJ) and (EX) represent general forms of inference, accepted as 
valid in the intuitionistic fragment of predicate logic; see e.g. [19]. Rule (PR)^ 
establishes that a primitive constraint which involves only primitive variables 
C primvar(r)) must be handled at the primitive level. 

The CGORC-rules for G and ^ are really very intuitive. Let us see now how 
to prove a disequality between two expressions e and r. Rule (NEQ)_^ has two 
tasks: To get approximations to the values represented by e and r (removing 
all function symbols) and, in presence of multisets, to reorder the elements by 
applying the GGORG-rule (OMUT). Disequality between terms not containing 
the multiset constructor at head can be checked by detecting disagreement of 
constructor symbols at head (rule (NEQ)j^) or by proving the disequality between 
some of the arguments (rule (NEQ)2). In order to prove a disequality between 
two multisets ms and ms' we proceed as done in [3] . 

To conclude the discussion, note that a constraint of the form e < r, where 
e < r ^ R*{DV), must be proved by using the GGORG-rule (LEQ), which 
“evaluates” e and r in order to get primitive terms, and then proving the corre- 
sponding constraint between such primitive terms. 

Semi-extended Data Terms and restricted CGORC Derivations: The 

calculus GGORG that we have just defined will be used in Section 4 to prove 
soundness of a goal solving mechanism. Completeness of goal solving will rely on 
a restricted use of CGOR C-provability. In order to introduce this idea, we define 
the set T^q^{DV) of semi-extended partial data terms as the subset of T|,^ (DV) 
composed of all those extended partial data terms not containing the symbols -I-, 
*, 0 and 1. Similarly, the set T’^q[DV) C T’^q (DV) of semi-extended total data 
terms is composed of those semi-extended partial data terms not containing the 
constant symbol T. Note that a semi-extended primitive term is either a variable 
or an element from K. 

We will use the notation Ih-p x to indicate that x can be proved (from 
T = 0) using a restricted GGORG derivation. By definition, this means a 
GGORG derivation built according to the following limitations: 

> Substitutions used in rule (OR) must map DV to T^q^{DV). 

> Rules (PR)!,, (OMUT), (C), (NEQ)_, (MEMB)_, (NMEMB)^ and (LEQ) 

must use approximation statements e — > t with t G T^q^{DV). 

Trivially, Ih-p x implies hp X- a consequence of Theorem 3 below, the 
converse implication will also hold. Semi-extended data terms will be useful also 
for the construction of free term models in Section 3. 




160 



P. Arenas- Sanchez et al. 



3 Model-Theoretic Semantics 

In this section we present a model-theoretic semantics, showing also its relation 
to the rewriting calculus CGORC and its restricted use. We will make use of 
several basic notions from the theory of semantic domains A poset with bottom 
_L is any set S partially ordered by C, with least element _L. Def(5') denotes the 
set of all maximal elements u G S, also called totally defined. X C S' is a directed 
set iff for all u,v G X there exists w G X s.t. u,v Q w. X is a, cone iff _Lg X and 
X is downward closed w.r.t. C. X is and ideal iff AT is a directed cone. We write 
C{S) and T{S) for the sets of cones and ideals of S, respectively. T{S) ordered 
by set inclusion C is a poset with bottom {_L}, called the ideal completion of 
S. Mapping each u G S into the principal ideal (u) = {u G S\v C m} gives an 
order preserving embedding. It is known that I{S) is the least cpo D s.t. S can 
be embedded into D. Due to these results, our semantic constructions below 
could be reformulated in terms of Scott domains. In particular, totally defined 
elements u G Def(S') correspond to finite and maximal elements {u) in the ideal 
completion. 

As in [9,4], to represent non-deterministic lazy functions we use models with 
posets as carriers, interpreting function symbols as monotonic mappings from 
elements to cones. The elements of the poset are viewed as finite approximations 
of possibly infinite values. For given posets D and E, we define the set of all non- 
deterministic and deterministic functions from D to E, respectively as follows: 

[D ^nd E] = {f-.D^ C{E) \ \/u,u' GD-. {u Qd u' ^ f{u) C f{u'))} 

[D E] = {f G [D E]\\fuGD-. f{u) G I{E)} 

Note that any non-deterministic function / can be extended to a monotonic 
mapping f* : C{D) — > C{E) defined as f*{C) = IJcsC /(c)- Abusing of notation, 
we will identify / with its extension /*. 



3.1 Specification of /=, G. and ^ by Horn Clauses 



Let us now define the behaviour of predicate symbols in PS (except for ==) 
by means of Horn clauses. Note that all Horn clauses below have a direct cor- 
respondence with the rewriting calculus CGORC , and they will determine the 
class of models of a program V . The Horn clauses are the following: 



Hi 


X G {[ 2 / 12 /s} d=x==y 




Hi : x 


^{[ }^ 




Hi 


X G {[ 2/12/s} ^xGys 




Hi : X 


^ {[ 2/12/s} 


x/= y,x^ys 


n 


c(X7^)/= d{jjjyf) 


% 


c ^ dj 


d^i-H 




Hi 


yi 


% 




}, one clause 


for each 1 < i < n 


Hi 


■[[a;|a;s }/= ys ^ x ^ ys 




Hi 


: xs/= {[2/1 2/s 


} -i^y^XS 


Hi 


\x\xs}/^ {[ 2 / 12 /s} ^x== 


y, xsj= ys 






Hi 


\x\xs}/= •{[ 2/12/s} lx\xs 


}- 


-{[x'|a;s 


'},{[ 2/12/s} ^ 


{[ 2 /'l 2 /s'}, 



lx'\xs'}/= -[[i/'| 2 /s'} 
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3.2 Algebras and Models 

We are now prepared to introduce our algebras, extending those from [4]. For- 
mally, a polymorphic S -algebra has the following structure: 

(U-^, T-^, {K-^} K(^{Reai}vjTC ^ {(>~^}oePPuPS , {f~^}feFs) 

where: 

(1) D-^ (data universe) is a poset with partial order and bottom element 

Furthermore, D-^ contains a copy of R given by a bijective mapping 

hp from D-^p to R., and verifies that: If d d' and d' G D-^p (resp. d € D-^p), 

then d =_L-^ or d = d' (resp. d' = d) 

(2) (type universe) is any non-empty set. 

(3) :"^C X is a binary relation such that for all £ G T-^, it holds that 

= {d^D-^\d-.^ £}& C{D-^) and S-^{Real^) = D^p U {_L-^}. 

(4) For each K G TC”, : (T-^)" — > is a function. 

(5) 0-^ = {h-i(0^),A^} = {h-\Ci^)) and 1-^ = {h~\l'^) , = {h~\l'^)). 

(6) For any di, ^2 G D^, 0 G {-h, *}: di<)-^d 2 = {h~^{hp{di) hp{d 2 )) , -L-^} = 

{hp^{hp{di) -\- hp{d 2 ))), if d\,d 2 G D^p and di<l)'^d 2 = otherwise. 

(7) For each c G DC"', G [{D^)" -^d {D^ — D^p)\ and satisfies: For all 
di G D^, 1 < t < n, there exists d G {D^ — D^p) such that cr^(di, . . . ,d„) 
= (d). Furthermore, if di are totally defined, 1 < t < n, then d is totally 
defined. 

(8) For all d\,d 2 G D~^: (di,d 2 ) G^'^ iff di,d2 G D^p and hp{d\) < hp{d 2 )- 

(9) For all d\,d 2 G D^: (di, ^2) G=="^ iff di, d2 are totally defined and di = d 2 - 

(10) For all <(> G {/=, G, ^}, <(>'^ C (Z?-^)^ and it is monotonic. Moreover: 

- If di,d2 G D^p and hp{di) ^ hp{d2) then (di,d2) G /="^. 

- If (di,d2) G /=^ and d2 G D-^p (resp. di G D^p), then di G D^p (resp. 
d2 G D^p) and hp{di) ^ hp{d 2 ). 

(11) For all / G FS", G [{D^)" ^nd D-^]. 

In the following, Alg{S) will denote the class of polymorphic 27-algebras. 
Note that item (7) ensures that constructors are interpreted as deterministic 
mappings that preserve finite and maximal elements. Furthermore, note also 
that all primitive symbols in Sp are interpreted in any algebra A according to 
their standard meaning in TZ. 

A valuation in A has the form ^ = {fJ,,r]), where p, : TV — > T-^ is a type 
valuation and g : DV ^ is a data valuation, ij is called totally defined iff 
rj{x) is totally defined, for all x G DV . Val{A) denotes the set of all valuations 
over A. 

For a given ^ = {p,rj) G Val{A), type denotations |r]-^^ = |r]-^/r G 
and extended partial expression denotations |e]"^^ = |e]'^?7 G C{D^) are de- 
fined as usual, by considering that all d G M is interpreted in A as {h~^{d)) = 

{d"i(d),_L-^}. 

We are particularly interested in those algebras that are well-behaved w.r.t. 
types. We say that A G Alg{S) is well-typed iff for all h : (n, . . . ,r„) ^ tq G 
F>C'_lUF 5'UPO, we have that d-^(f-^(|Ti]-^/x),... , f-^(|Ti]-^/x)) C 5 -^(|to1-^m), 
for every type valuation p. Also, for given ^ = (/r, 77) G Val{A), we say that ^ 
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is well-typed w.r.t an environment V iff rj{x) € for every x : t G V. 

Reasoning by structural induction, we can prove that expression denotations 
behave as expected w.r.t. well-typed algebras and valuations: 

Theorem 1. Let V be an environment. Let A G Alg(U) be well-typed and ^ = 
(/i,r?) € Val{A) well-typed w.r.t. V. For all e G E^^{V), lej-^r] C £-^(|r]-^^). 

Next we define the notion of model. Consider A G Alg{S). Let 77 be a data 
valuation over A. Then: 

[> (^,77) \= ei<^e2, C” G PP U PS, iff there exist di G |ei]-^?7, 1 < t < 2, such 
that (di, ^2) G 

> {A, rj) \= (fiA (f2 iff (-4, rj) \= (pi, 1 < i <2. 

> {A, rf) ^ 3x<p iff there exists d G such that {A, r][x ^ d\) \= p. 

> (^, 77) ^ e ^ e' iff |e']-^?7 C |e]-^?7. 

> ^ 1= {[a;, 7/|a;s ]} « {[y,x|a;s]} iff for any 77, it holds that |{[x, y|a;s ]}]-^77 = 
lly,x\xs}l-^r). 

> A satisfies a Horn clause of the form A ^ Bi, . . . , Bm iff for any data valua- 
tion 77 such that {A, 77) \= Bi, 1 < i < m, it holds that {A, 77) ^ A. 

> A satisfies a defining rule e ^ r <^= 173 iff every data valuation 77 such that 
{A, rj) \= p verifies that {A, rj) |= e — > r. 

> Let V = {E, R) be a program. ^ ^ 7^ iff ^ satisfies every defining rule in R 
and (mset). 

The class M. C Alg{S) of polymorphic algebras is composed of those A G 
Alg{E) such that A satisfies (mset) and all Horn clauses in Subsection 3.1, 
whereas M{V) =oef {Ag M \ A'j^ V}. 

Definition 1. (Logical consequence) Consider F C R*^^[DV) in solved form 
and x = e — or x G R\.^{DV), where e G E^^(DV), t G Tf.^{DV). Lt holds 
that F \= X iff for all A G A4 {V), and for all rj G Val{A) totally defined, if 
{A, 7]) h r then {A,r]) \= x- 

The following theorem establishes the soundness of the rewriting calculus 
CGORC. It can be proved by induction on CGORC derivations. 

Theorem 2. (Soundness of CGORC) Consider F C R*^^[DV) in solved 
form, and x G R*^^{DV) or x = e ^ t, where e G E*^^{DV), t G Tf.^{DV). Lf 
FG-p X then F \= x- 

3.3 Free Term Models 

Given a program V = {E, R) and an environment V, we define the term algebra 
MT{V,V) as follows: 

• Let X be the set of data variables occurring in V. The set Tf,Q^{X) /piMset is 
the data universe, where Tf,Q^{X) Mset= Def {[i] \ tG Tf,Q^{X)}. [t] is defined 
as the set {t' G Tf,^^(X) | t ~Mset t'}, where t ~Mset t' iff Ih-p t ^ t' and 
\Gp t' — > t. 

The bottom element is [_L] = {-L}, and the partial order \ z ^ t { v , v ) jg defined 
as: For any [t], [t'] G (^)/~M3et, [t] jfj ||_.p 7 ^ 
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Note that T^Q^{X)/p^Mset contains {[c?] | d € R} as copy of R given by the 
bijective mapping hp{\d\) = d, for all d G R. 

• Let A be the set of type variables occurring in V . The type universe is Type{A) = 
{t G Type rpQ^TV) \ tvar{T) C A}. 

. For all [t] e T^cjX)/^Mset, t G Type{A), [t] -.^nvT) r iA t G 

• For all K G TC" U {Real}, n G Type{A): , r„) = 7G(n, . . . , r„). 

. and l^nvT) ^ 

. For all [t], [T] G TlcM)/^Mset, ❖ G {+, *}: [t'] = ([t^F]), if 

t,t' G R and [F] = ([_L]) otherwise. 

. For all c G DC^, [U] G T*pjcjX)/^Msef- . . . , [t„]) = ([c(ti,. . . ,t„)]). 

• For all [ti] G T})cAX)/^Mset- {[ti], [^ 2 ]) iff ti,t 2 G R and h < t 2 - 

• For all [ti] G ThcA^)/~Mset- ([ti], [t 2 ]) ^ ^ T*oc{X). 

• For all [ti] G TBcJX)/f=^Mset, 0 G {/=, G, ^}: ([ti], [ta]) G ||_^ 

• For all [U] G T^cA^)/^Mset, 1 < i < n, f G FS^: 

€ '^DC± (X)/« 

Mset I H-'P f{tl, t}. 

Next result ensures that MT{V,V) belongs to the class of algebras M{V), 
and that restricted C'G'OiJC'-derivability is sound and complete with respect to 
our notion of model. 

Theorem 3. (Adequateness of MT{V, V) ) Let V = {X, R) he a program and 
V an environment. It holds: 

(1) MT{V,V) G M{V) and ifV is well-typed then MT{V,V) is well-typed. 

(2) Consider x = e^t or x ^ where e G F}^^(X), t G Tf,Q^{X). 

Then the following statements are equivalent: 

(2.1) Ih-p X- 

(2.2) {A,q) ^ x> /o?' 0,11 A G A4{V) and for all q totally defined. 

(2.3) {MT{V,'P),[id\) |= X; where id is the identity substitution over X . 

Due to Theorem 2, unrestricted CGORC-derivability is sound w.r.t. mod- 
els in the class Xi{V). Nevertheless, completeness in the sense of Theorem 

3 holds only for restricted GGORG-derivability. For instance, it is true that 
(MT{V,V), [id]) \= x+ 1 x+ 1, but there is no CGORC proof for h-p x-|-l — > 
X -I- 1. The point is that x -I- 1 zs not a semi-extended data term. 

We can also give a characterization of MT{V,V) as a free object in the cat- 
egory of all models of V . This relies on a notion of morphism similar to that 
from [4], extended to deal with constraints in a natural way. See [2] for details. 

4 Operational Semantics 

This section presents a Lazy Narrowing Calculus {INC for short), which is a 
goal solving procedure that combines lazy narrowing with unification modulo 
(mset) and constraint solving. In order to ensure the completeness of INC 
(Theorem 4), the process of solving a goal is divided in two main phases, as 
done in [10,5]. A derivation for a goal G (composed of constraints) is a finite 
sequence of -steps (named ^p -derivation) followed by a finite sequence 
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of -steps (named a '^_dv - derivation). The -derivation transforms G 

into a quasi-solved goal G' containing only approximation statements of the 
form t ^ X and constraints in solved form, while the ^ dv -derivation processes 
variables, thereby transforming G' into a solved goal which represents a solution 
for G. The whole process preserves well- typing. 

Formally, an admissible goal G for a program V has the structure G = 3u ■ 
5'nPni?, where: 

> S = x\ = ti,...,x„ = is a system of equations in solved form, i.e., 
Xi occurs exactly once in G. Note that S represents the substitution Ss G 
DSub{Ts{DV)) defined as 5s{xi) = ti, 1 < i < n and <5s(2) = 2 otherwise. 

> P = ei ^ si, . . . , Cfc — > Sfc is a multiset of approximation statements. 

> i? C R^{DV) is a multiset of constraints, and must fulfill several technical 
requirements similar to those presented in [9,5] in order to achieve soundness 
and completeness, along with the new condition: 

(PR) For all ef)t G S' U P, <[> G {=, ^}, t does not contain to 0, 1, -I- and *. 
G = 3u ■ SOPOR is well-typed iff there exists an environment V such that 
for all e<)e' G S U P, <[> G {^,=}, there exists r G Typerpu{TV) such that 
e, e' G and for all (p G R, (p is well-typed w.r.t. V. 



4.1 Demanded and Pending Variables 

Similarly to [9,5], LNC uses a notion of demanded variable to deal with lazy 
evaluation. Intuitively, approximation statements e x in G, where e contains 
some function symbol, do not propagate the binding xfe. Instead, evaluation of 
e must be triggered, provided that x is demanded. The result will be shared by 
all the occurrences of x. 

A variable x of G = 3u ■ SOPOR is demanded iff P contains a sequence of 
approximation statements of the form: to ^ x\,ti X 2 , ■ ■ ■ ,tn Xn+i, where 
ti G Ts{DV), 0 < i < n, X G dvarfto), Xi G DV , l<i<n-|-l, x^G dvarfti), 
1 < i < n, and one of the following conditions holds: 

> x„+iOe G P or eOcCn+i G R, where 0 G {==, /=, G}. 

> e ^ Xn+i G R or x„+i ^ {[e|es]} G P. 

> Xn+i G lib{pP), where G Rp(DV) n P. 

Demanded variables x are forced by solutions to take values different from 
T. This justifies why x is not considered as demanded in constraints x es, 
where es has not the form {[r|rs]|-. 

We also need a notion of pending variable to detect situations where goal 
solving must proceed by trying different imitation bindings. Formally, given a 
goal G = 3u- SOPOR, we say that is pending in G iff P contains a sequence of 
the form e ^ xq, G — > xi , ^2 ^ X 2 , . . . ,tn ^ x„, where e G Es{DV) — Ts{DV), 
ti G Ts{DV), 1 < i < n, Xi G DV, 0 < i <n and Xi G dvar{ti+i), 0 < i < n — 1. 
The set of pending variables of G will be noted as pend(G) in the sequel. 

As an example, consider a goal of the form 3u • SOe xOx xs, where e 
contains defined function symbols and possibly denotes an infinite value. Variable 
X is not demanded and the constraint x ^ xsis solved, but the whole goal is not 
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solved. Since x is pending, the LNC calculus will allow goal solving to progress 
by trying two possible imitation bindings for xs: binding zs to {[ } will lead 
immediately to a solved goal, while binding xs to {[j/|ys]} will produce a new 
goal including a constraint x ^ {[ylj/s}, which can be further processed. 



4.2 Goals in Solved Form, Answers and Solutions 

An admissible goal G is quasi-solved iff G has the form 3u ■ SOPOR, where 
R contains only constraints in solved form or equalities x == y, x,y G DV , 
and P contains only approximation statements of the form t — > x, where t G 
Ts{DV) — DV and x is not a demanded variable, or t G DV . Note that it is 
important to require x to be not demanded in such approximation statements. 
This is needed in order to preserve quasi-solved goals when applying variable 
elimination rules. For instance, if we allow the goal G = OA xOxj= A, where 
A is a constant symbol (note that x is demanded), then when propagating the 
binding xjA, the resulting goal G' = OOAj= A would not be quasi-solved. 

An admissible goal G is in solved form iff G has the form 3u • SOOR, where 
R contains only constraints in solved form, and for all x == y G R, x,y € DV , 
it holds that x == y G PP(G), where PP{G) is defined similarly to PP{P) but 
now considering that also equality constraints x == y contribute to PP{G). 

W. r. t. primitive constraints, LNC is not going to perform any explicit trans- 
formation. Without loss of generality, we can suppose that all LAG-irreducible 
goals suffer a process of simplification similar to that in CLP {TV). This is possi- 
ble because all the primitive constraints within a goal G in solved form will be 
isolated in the primitive part PP{G). 

A correct answer for G = 3u ■ SOPOR is a pair (<5, T) such that 6 G 
DSub{Tf,Q{DV)), P C Rsj^{DV) is finite and in solved form and there ex- 
ists t G Tf)Q^{DV) such that S' = (called an existential extension of 5 in 

the sequel) verifies that: 

> There exists a primitive valuation rj^ over TZ such that {TZ,rf) \=n PP{P). 

o For every equation x = s G S: xS' = sS' G Tf,Q{DV). 

> For all X G PU i?, it holds that P hp x^' ■ niultiset containing CGORC- 

proofs for all the elements in {P O R) S' is named a witness A4 for G and 
{S,P). 

A solution for G is a correct answer {S, P) for G such that P = ^. Note that 
the condition (PR) in admissible goals ensures that every e ^ t G G verifies that 
t does not contain the symbols -I-, *, 0 and 1. But Theorems 2 and 3 ensure that 
F-p and Ihp are equivalent in presence of approximation statements of the form 
e ^ t, where e G E’^^{DV), t G Tf)Q^{DV) and constraints ip G R’^^{DV). 
Thus, given any solution <5 G DSub{Tf,Q{DV)) for G and x G -P U P, to prove 
hp xS' is equivalent to prove that Ihp x^^ where S' is an existential extension of 
S. This means that the notion of solution for a goal can be established in terms 
of restricted GGOPG-derivations. 

On the contrary, the notion of correct answer needs the full power of unre- 
stricted GGOPG-derivations. For instance, ({ }, {x == y}) happens to be a LNC 
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computed answer for the goal G = c{x+l) == c{y+ 1). According to the Sound- 
ness Theorem 4 below, this computed answer must be correct, and there must be 
some unrestricted CGOi^C-derivation proving {x == y\ \--p c{x+l) == c{y+l). 



4.3 LNC Transformation Rules. Soundness and Completeness 

Due to the lack of space, we will not present completely all the transformation 
rules for LNC but we will give the main ideas behind the transformation rules. 
A complete description of LNC can be found in [2]. 

All transformation rules in LNC have been designed in order to get a com- 
pleteness result (Theorem 4) whose proof is based on the following idea: Given 
an admissible goal G different from FALL and not solved, and a solution S for G, 
it is possible to find a transformation rule T such that when we apply T to G, 
“something decreases” while preserving (possibly modulo (mset)) the solution 
6. Here, FALL represents an irreducible and inconsistent goal. 

In the case of non quasi-solved goals, “something” refers to a witness, and 5 is 
totally preserved. By “decreasing” we refer to the following multiset ordering: Let 
At = {[i7i, . . . , i7„]} and M' = {[7T(, . . . be multisets of Ih-p-proofs, then 

M is smaller than M' iff {[ |ili|, . . . , |LI„| ]} A {[ \II[\, . . . , |77(„| }, where |il| is 
the size (i.e., the number of inference steps without considering the applications 
of the rule (PR)^) of LL, and A is the multiset extension of the usual ordering 
over the natural numbers. 

In the case of quasi-solved goals, “something” refers to G, and 6 is preserved 
modulo (mset). Now, by “decreasing” we refer to the following lexicographic 
ordering: Given any two quasi-solved goals G = 3u ■ SOPOR and G' = Bu' ■ 
S'OP'OR', we say that G is smaller than G' iff (ni,mi) < (n 2 ,TO 2 ), where 
ni (resp. 712 ) is the number of approximation statements in G (resp. in G'), 
whereas mi (resp. m 2 ) is the number of constraints of the form x == y, such 
that X == y ^ PP{G) (resp. x == y ^ PP{G')). 

Now, in order to design LNC , it is enough to analyze the different kinds of 
approximation statements and constraints in a not yet solved admissible goal G, 
looking for some transformation which allows to ensure what we have commented 
above. This is done by analyzing the possible structure of a goal G. In the analysis 
below, 5 will denote a solution for G and 5' will be an existential extension 
of 5. With respect to approximation statements, let us analyze some of the 
possibilities. If G is of the form: 

0 G = • S'Dc(e„) ^ d{im),P'^R, where c G DG", d G DC™'. The definition 

of solution establishes that c(e„)5' ^ d{tm)d' must be Ih-p-provable. Such a 
proof must use as first inference rule (DC) or (OMUT). According to these two 
CGORC-rules, we have, respectively, the following two ^ 73 -rules: 

Dec^: 3u • SOc{en) c{t„), PDR ^-p 3u • SOei ^ ti, . . . ,e„ ^ t„,PoR 

Mut^: 3 m • SO^eles} ^ s,PoR 

3u,x, y, xs ■ SOe x, es {[ i/| is },{[{/, a:|a:s ]} ^ s, POR 

where x, y, xs are fresh variables. 
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0 G = 3m • 5'n/(e„) ^ t, POR, where / G FS^. Now, the only possibility of 
proving \\--p f(en)5' — > t5' is using as last inference rule (OR) but ensuring that 
t5' ^_L. Hence, the -transformation rule will be: 

Rule-*: 3u ■ S'D/(e„) ^ t, POR ^-p 3m, x ■ SOei ^ ti, . . . ,en ^ tn,r ^ t, POip, R 
If t ^ DV or t is a demanded variable (which ensures that t6' ^-L), where 
f{tn) —^r^ipisa, fresh variant of a rule in R with variables x. 

The rest of cases can be reasoned similarly; see [2]. With respect to con- 
straints, let us analyze several cases: 

> G = 3'uS'nPDe G es,R, where es ^ DV. Then, if es contains a function 
symbol, according to (MEMB)^, we will have the rule: 

Memb^: 3u • S'nPDe G es, R 3m, x, xs ■ S'Des ^ {[a:|xs ]}, PDe G {[a;|a:s }, R 
where x, xs are fresh variables. 

Otherwise (es have the form {[t|ts]}), then we will have two new rules, ac- 
cording to the GGOPG-rules (MEMB)j^ and (MEMB )2 respectively, that can be 
designed similarly to Memb^. 

> G = 3u ■ SOPOe ^ xs,R, where xs & DV . li e contains some function symbol 
in FS or e G Ts{DV) but e contains some variable of the set pend(G), then let 
us analyze all possible forms of the proof Ihp e5' ^ xs5' . If such a proof has used 
(NMEMB)j^, then we have the following LNG-transformation rule: 

Nmemb III: 3m • S'DPDe ^ xs, R ^p 3m • is = {[ ]}(S'nPnP)[is/{[ ]}] 

Otherwise, the proof Ihp e6' ^ xs5' has used (NMEMB) 2 , and the corresponding 
LNG-rule will be defined. 

The rest of the rules for solving constraints can be designed similarly. Some- 
thing similar happens with -rules. In presence of approximation state- 

ments of the form t ^ x, where t is either a non-variable primitive term or a 
variable belonging to primvar(G), we generate a new goal transforming t x 
into t == X. Otherwise, t ^ x disappear, propagating the binding x/t. 

The soundness and completeness theorem for LNC , whose proof can be found 
in [2], is stated below. As notation, 5' =Mset 5 means that <5'(x) ~Mset S{x), for 
all X G DV. 

Theorem 4. (Soundness and completeness) 

Soundness: Let G be an initial goal, G' a quasi-solved goal and G" = 3u- SOORR 
a goal in solved form such that G^f, G' G" . Then {5s, RR) is a correct 

answer for G. 

Completeness: Let V = {E, R) be a program, G an initial goal and 6 a solution 
for G. Then there exist a quasi-solved goal G' and a goal G" in solved form such 
that G'^p G' ‘^*Dy G" = 3m • S'DDPP, and G" has a solution a verifying that 
O' =Mset 5. Furthermore, ifV and G are well-typed then G5s is well-typed. 

Note that our completeness result is restricted to solutions instead to correct 
answers. For instance, consider the program rules / ^ Zero and g{y, ys) — > 
True y ^ ys, and the initial goal G = OOg{Suc{f), zs) == True. It is 
easy to check that ({ },{Suc{Zero) ^ zs}) is a correct answer for G. However, 
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LNC can not compute the correct answer ({ }, {Suc(Zero) ^ zs}). Instead, LNC 
enumerates infinite solutions 2 S = {[ ]}, 2 S = {[ Zero i.e., the correct answer 

is covered by an infinite number of solutions. 



5 Conclusions 

Starting from the framework of [4,5], limited to the case where all the data 
constructors except the multiset constructor are free, we have proposed the lan- 
guage SETA, which differs from other related languages (as e.g. [8,7,13]) in its 
rich combination of lazy functions, datatypes and arithmetic constraints. SETA 
has a firm mathematical foundation, with proof-theoretic and model-theoretic 
semantics, as well as a sound and complete goal solving mechanism. Moreover, 
SETA seems to have a potential wide range of applications, including parsing of 
visual languages, which are worth of further investigation. 

Our narrowing calculus is not intended to serve, “as it is”, as a concrete 
computational model. An actual implementation should avoid an indiscriminate 
use of the multiset equation (mset), which can obviously lead to useless infinite 
computations. A first attempt to build such an implementation has been pre- 
sented in our previous paper [3], where neither mathematical foundations nor 
built-in arithmetic constraints were considered. More work on implementation 
methods is still needed, also in regard to efficient narrowing strategies such as 
demand driven, also called needed narrowing [16,1], whose generalization to the 
case of non- free data constructors does not seem obvious. 
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Abstract. Interaction nets are graphical rewriting systems which can 
be used as either a high-level programming paradigm or a low-level im- 
plementation language. However, an operational semantics together with 
notions of strategy and normal form which are essential to reason about 
implementations, are not easy to formalize in this graphical framework. 
The purpose of this paper is to study a textual calculus for interaction 
nets, with a formal operational semantics, which provides a foundation 
for implementation. In addition, we are able to specify in this calculus 
various strategies, and a type system which formalizes the notion of par- 
tition used to define semi-simple nets. The resulting system can be seen 
as a kernel for a programming language, analogous to the A-calculus. 



1 Introduction 

Interaction nets, introduced by Lafont [12], offer a graphical paradigm of compu- 
tation based on net rewriting. They have proven themselves successful for appli- 
cation in computer science, most notably with the coding of the A-calculus, where 
optimal reduction (specifically Lamping’s algorithm [13]) has been achieved [9]. 

Although the graphical representation of interaction nets is very intuitive, 
graphical interfaces (editors) have not been forthcoming; a textual language 
is therefore necessary. Such a language would require ways of representing the 
interaction nets and rules, together with a reduction system to express how a rule 
should be applied. Lafont suggested in [12] a rather beautiful textual notation 
for interaction rules, but a general study of it has not emerged. 

In this paper we define a calculus of interaction nets based on this notation. 
We provide the coding of interaction nets and rules, and a reduction system 
which can be seen as a decomposed system of interaction. Various notions of 
normal form and strategies for reduction can be formalized in this calculus, which 
provides the starting point for a more general treatment of abstract machines for 
implementing interaction nets. To enforce a discipline of programming, a type 
assignment system with user-defined types is introduced, which incorporates the 
notion of partition used by Danos and Regnier to generalize the multiplicative 
connectives of linear logic [5], and by Lafont to define semi-simple nets [12]. 

G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 170-187, 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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Apart from the fact that it simplifies the actual writing of programs, a formal, 
textual, operational account of interaction nets has many advantages. Static 
properties of nets, such as types, can be defined in a more concise and formal way 
(compare the definition of the type system given in Sect. 3 and the definitions 
given in e.g. [6,12]). By giving a formal account of the rewriting process, the 
calculus provides the basis of an implementation of interaction nets. Interaction 
nets are strongly confluent, but like in all rewriting systems, there exist different 
notions of strategies and normal forms (for instance, irreducible nets, or weak 
normal forms associated to lazy reduction strategies) . These are hard to formalize 
in a graphical framework, but we will see that they can be precisely defined in 
the calculus. Such strategies have applications for encodings of the A-calculus, 
where interaction nets have had the greatest impact, and where a notion of a 
strategy is required to avoid non-termination of disconnected nets (see [16]). 

Reduction algorithms and strategies also play a crucial role in the study of 
the operational semantics of interaction nets, in particular, operational equiva- 
lences of nets. Applications of this include [7] where the definition of a strategy 
was essential to show the correspondence between bisimilarity and contextual 
equivalence. In [7] an informal textual notation was used throughout the paper 
to help writing nets, but the formal definitions of strategy of evaluation and op- 
erational equivalence had to be given in the graphical framework. The calculus 
defined in this paper provides a formal and uniform notation for writing nets 
and defining their properties. 

Related Work. Banach [3] showed that interaction nets are closely related 
to connection graphs of Bawden [4] and gave a formal account of these for- 
malisms via hypergraph rewriting, using a categorical approach. Honda and 
Yoshida [10,18,19] studied various graphical and textual process calculi that 
generalize interaction nets; their emphasis is in the study of concurrent com- 
putations. The Interaction Systems of Laneve [14] are a class of combinatory 
reduction systems closely related to interaction nets (the intuitionistic nets). 
Strategies have been well studied in this framework, in particular for optimal 
reduction. Related to this work is also the encoding of interaction nets as com- 
binatory reduction systems given in [8]. The notations that we introduce in the 
present work are inspired by formalisms for cyclic rewriting [2], and proof ex- 
pressions for linear logic [1]. 

Overview. The rest of this paper is structured as follows: In the next section 
we recall some basic preliminaries on interaction nets and present several textual 
languages for interaction nets. Section 3 gives a thorough study of the calculus 
and presents the type system. Section 4 shows how strategies and reduction 
algorithms can be easily expressed in this framework. Finally, we conclude the 
paper in Section 5. 



2 Background 

An interaction net system is specified by a set E of symbols, and a set TZ of 
interaction rules. Each symbol a G E has an associated (fixed) arity. An occur- 
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rence of a symbol a € 27 will be called an agent. If the arity of a is n, then the 
agent has n + 1 ports: a distinguished one called the principal port depicted by 
an arrow, and n auxiliary ports corresponding to the arity of the symbol. We 
will say that the agent has n + 1 free ports. 

A net N built on 27 is a graph (not necessarily connected) with agents at 
the vertices. The edges of the net connect agents together at the ports such 
that there is only one edge at every port (edges may connect two ports of the 
same agent). A net may also have edges with free extremes, called wires, and 
their extremes are called ports by analogy. The interface of a net is the set of 
free ports it has. There are two special instances of a net: a wiring (no agents), 
and the empty net. A pair of agents {a, (3) G 27^ connected together on their 
principal ports is called an active pair-, the interaction net analogue of a redex. 
An interaction rule {{a, (3) N) G TZ replaces an occurrence of the active 
pair {a, (3) by the net TV. Rules have to satisfy two very strong conditions: the 
interface must be preserved, and there is at most one rule for each pair of agents. 
The following diagram illustrates the idea, where N is any net built from 27. 




As a running example, we use the system of interaction for proof nets of 
multiplicative linear logic, which consists of two agents, 'S’ and 0 of arity 2, and 
one interaction rule. The following diagram indicates the interaction rule and an 
example net. 




Three textual languages for interaction nets have been proposed in the litera- 
ture. Inspired by Combinatory Reduction Systems [II], a net can be “flattened” 
by replacing ports of agents by names, and representing edges by two occurrences 
of a name. The expression ®{a, b, c),>S’(a, d, d),>^(b, e, e) represents the example 
net above, where the first argument of each agent is the principal port. The in- 
teraction rule is written: 0(a, b, c), 'S’(a, d, e) /(c, e),I{b, d) where I{a, b) is a 
wire with extremes a, b. This language has been used as a notation for interaction 
nets in [7,8], and is quite straightforward to relate to the graphical notation. 

A second textual notation [15] eliminates the variables. The ports of each 
agent are indexed a.i, with the principal port given by a.O. A wiring relation is 
used to express the connectivity of the ports of agents in a net: a.i = f3.j indicates 
that there is a link between port i of agent a and port j of agent (3. For example. 
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— > (0,{'S’i-l = = 0J.2}) represents the interaction rule for 

proof nets. The left-hand side denotes an active pair and the right-hand side is 
a set of agents (empty in this example) together with a wiring. 

Lafont [12] proposed a textual notation for interaction rules, where the ex- 
ample rule is written as ®{x,y) ixi ^{x,y). This notation is much lighter syn- 
tactically, but more esoteric. It seems the most useful as a formal language for 
interaction nets, although some practice is needed to become acquainted with 
it. This notation inspired the calculus that we develop in the rest of the paper. 

3 The Calculus 

We begin by introducing a number of syntactic categories. 

Agents: Let A be a set of symbols, ranged over by a,(3 , . . ., each with a given 
arity ar : if — > IN. An occurrence of a symbol will be called an agent. The 
arity of a symbol corresponds precisely to the number of auxiliary ports. 
Names: Let iV be a set of names, ranged over by x,y,z, etc. N and S are 
assumed disjoint. 

Terms: A term is built on E and N by the grammar: t ::= x \ . . . ,tn), 

where x € N, a G E, ar(a) = n and t\, . . . ,tn are terms, with the restriction 
that each name can appear at most twice. If n = 0, then we omit the paren- 
theses. If a name occurs twice in a term, we say that it is bound, otherwise it 
is free. Since free names occur exactly once, we say that terms are linear. We 
write Tfor a list of terms ti, . . . , A term of the form a(t) can be seen as a 
tree with the principal port of a at the root, and where the terms t\, . . . ,tn 
are the subtrees connected to the auxiliary ports of a. 

Equations: If t and u are terms, then the (unordered) pair t=u is an equation. 
A, 0, . . . will be used to range over multisets of equations. Examples of 
equations include: x = a(i), x = y, a{i) = P{u). 

Rules: Rules are pairs of terms written as a{i) ixi (3{u), where {a, (3) € E"^ is 
the active pair of the rule. All names occur exactly twice in a rule, and there 
is at most one rule for each pair of agents. 

Definition 1 (Names in terms). The set N ft) of names of a termt is defined 
in the following way, which extends to multisets of equations and rules in the 
obvious way. 

N{x) = {x} 

N{a{h, ..., tn)) = N{ti) U • • • U Af(t„) 

Given a term, we can replace its free names by new names, provided the linearity 
restriction is preserved. 

Definition 2 (Renaming). The notation t[y/x] denotes a renaming that re- 
places the free occurrence of x in t by a new name y. Remark that since the 
name x occurs exactly once in the term, this operation can be implemented di- 
rectly as an assignment, as is standard in the linear case. This notion extends 
to equations, and multisets of equations in the obvious way. 
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More generally, we consider substitutions that replace free names in a term 
by other terms, always assuming that the linearity restriction is preserved. 

Definition 3 (Substitution). The notation t[u/x\ denotes a substitution that 
replaces the free occurrence ofx by the term u in t. We only consider substitutions 
that preserve the linearity of the terms. Note that renaming is a particular case 
of substitution. 

Lemma 1. Assume that x ^ J\f (v) . Ify&Af{u) then t[u / x][v / y] = t[u[v / y] / x] , 
otherwise t[u/x][v/y] = t[v /y][u/ x\. 



Definition 4 (Instance of a rule). If r is a rule a{ti, . . . , t„) cxi P{ui, . . . , Um), 
then r = a{ti, . . . , t„) ixi P(ui, . . . , uff) denotes a new generic instance of r, that 
is, a copy of r where we introduce a new set of names. 

We now have all the machinery that we need to define interaction nets. 

Definition 5 (Configurations). A configuration is a pair: c = (TZ, (t | A}), 
where TZ is a set of rules, t a multiset {t \, . . . , of terms, and A a multiset of 
equations. Each variable occurs at most twice in c. If a name occurs once in c 
then it is free, otherwise it is bound. For simplicity we sometimes omit TZ when 
there is no ambiguity. We use c, c' to range over configurations. We call t the 
head and A the body of a configuration. 

Intuitively, (t | A) represents a net that we evaluate using TZ] A gives the set 
of active pairs and the renamings of the net. The roots of the terms in the head 
of the configuration and the free names correspond to ports in the interface of 
the net. We work modulo a-equi valence for bound names as usual, but also for 
free names. Configurations that differ only on the names of the free variables are 
equivalent, since they represent the same net. 

Example 1 ( Configurations). The empty net is represented by ( | ), and the con- 
figuration ( I *S’(o, a) = C>(&, b)) represents a net without an interface, containing 
an active pair. A configuration (t | ) represents a net without active pairs and 
without cycles of principal ports. 

There is an obvious (although not unique) translation between the graphi- 
cal representation of interaction nets, and the configurations that we are using. 
Briefly, to translate a net into a configuration, we first orient the net as a col- 
lection of trees, with all principal ports facing in the same direction. Each pair 
of trees connected at their principal ports is translated as an equation, and any 
tree whose root is free or any free port of the net goes in the head of the configu- 
ration. We give a simple example to explain this translation. The usual encoding 
of the addition of natural numbers uses the agents E = {Z, S, Add}, ar(Z) = 0, 
ar(S') = 1, ar{Add) = 2. The diagrams below illustrate the net representing the 
addition 1 -|- 0 in the “usual” orientation, and also with all the principal ports 
facing up. 
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We then obtain the configuration {x \ S{Z) = Add{x, Z)) where the only port 
in the interface is x, which we put in the head of the configuration. 

The reverse translation simply requires that we draw the trees for the terms, 
connect the common variables together, and connect the trees corresponding to 
the members of an equation together on their principal ports. 

Remark 1. The multiset t is the interface of the configuration, and this can 
be all, or just part of, the interface of the net represented by (t | A). For 
example, the net consisting of the agents Succ and True can be represented 
by {True \ y = Succ{x)), {True, Succ{x) | ), or ( | a;= True, y = Succ{z)) . In 
other words, we can select a part of the interface of the net to be displayed 
as “observable” interface; the same net can be represented by several different 
configurations. See [7] for an example of use of observable interface. 



Definition 6 (Computation Rules). The operational behaviour of the system 
is given by a set of four computation rules. 

Indirection: If x € Af(u), then x = t,u = v — > u[t/x] = v. 

Interaction: If a{t [, . . . , ixi P{u'i , . . . , u'^) G TZ, then 

Cx{ti, . . . ,tn) /^(^l 7 • ■ • ^ ^m) ^ 

tl tj^, . . . , Ul U^, . • . , UjYl 'Uyyj 

Context: If A — > A', then (t | r,A,T') — > (t | r,A',T'). 

Collect: If x G A/"(t), then {t \ x = u, A) — > {t[u/x\ \ A). 

The calculus puts in evidence the real cost of implementing an interaction 
step, which involves generating an instance, i.e. a new copy, of the right-hand 
side of the rule, plus renamings (rewirings). Of course this also has to be done 
when working in the graphical framework, even though it is often seen as an 
atomic step. The calculus therefore shows explicitly how an interaction step is 
performed. 

Example 2 (Natural Numbers) . We show two different encodings of natural num- 
bers. The first is the standard one, the second is a more efficient version which 
offers a constant time addition operation. 
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1. Let S = {Z^S^Add} with ar(Z) = 0, ar(5') = 1, zr{Add) = 2, and TZ\ 

Add{S{x),y) ixi S{Add{x, y)) 

Add{x,x) Z 

As shown previously, the net for 1+0 is given by the configuration (7?,, (a | 
Add{a, Z) = S{Z))). One possible sequence of rewrites for this net is the 
following: 



(a I Add{a,Z) = S{Z)) 

— > (a I a = S(x'), y' = Z, Z = Add{x' , y')) 

— {S{x') I Z = Add{x',Z)) 

(S(x') I x" = x',x" = Z) 

(5(Z) I ) 

2. Let A = {S, IV, N*}, ar(S') = 1, ar(A^) = ar(iV*) = 2. Numbers are repre- 
sented as a list of S' agents, where IV is the constructor holding a pointer to 
the head and tail of the list. 0 is defined by N{x,x), and n by N{S'^{x),x). 
The operation of addition can then be encoded by the configuration 

{N{b,c),N*{a,b),N*{c,a)\) 

which simply appends two numbers. There is only one rule that we need: 
N{a, b) tx] N*{b, a), which is clearly a constant time operation. To show how 
this works, we give an example of the addition of 1+1: 

{N{b, c) I N{S{x), x) = N*{a, b),N{S{y),y) = N*{c, a)) 

— {N{b,c) I b=S(a),a = S(c)) 

(iV(S(S(c)),c) I) 

Example 3 (Proof Nets for Multiplicative Linear Logic). The usual encoding of 
proof nets in interaction nets uses E = {®, '§>}, ar(0) = ar('S>) = 2, and the 
interaction rule: ®(x, y) cxi ^{x, y). We show a configuration representing the net 
used as a running example in the previous section, and a reduction sequence: 

(c I >S>(a, a) = ®{'^{b, b),c)) 

— > {c\xi = a, yi = a, xi=>^{b, b),yi = c) 

— (c I c=^{b,b)) 

^{>^{b,b)\) 

The next example shows how we can capture semi-simple nets [12], where 
vicious circles of principal ports cannot be created during computation. The 
notion of partition was used in [12] with the purpose of defining this class of 
nets. For each agent a G A the (possibly empty) set of auxiliary ports is divided 
into one or several classes, each of them called a partition. A partition mapping 
establishes, for each agent in E, the way its auxiliary ports are grouped into 
partitions. We recall the graphical definition of semi-simple nets and at the 
same time show the translation to configurations. 
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Example 4 (Semi-simple Nets). Semi-simple nets are inductively defined as: 

1. The empty net is semi-simple, and is represented by the configuration ( | ). 

2. A wire is a semi-simple net, and is represented by a configuration of the form 

{y,y I >• 

3. If A^i, N 2 are semi-simple nets, their juxtaposition (called Mix) gives a new 
semi-simple net which is represented by the configuration (s, t | A,0), where 
(t I A) and (s | 0) are the configurations representing Ni and N 2 respectively 
(without loss of generality we assume that these configurations do not share 
variables, since we work modulo a-conversion) . 

4. If A^i, A "2 are semi-simple nets, a Cut between ports i and j builds a semi- 
simple net represented by the configuration (s — Si,t — tj \ A, 0, Si = tj), 
where iVi = (s | A) and A "2 = (t | 0), s — is the multiset s without the 
element Si, and t — tj is the multiset t without the element tj. 

5. If Ni,...,Nn are semi-simple nets, the Graft of an agent a to the nets 
Ni, ... , Nn according to its partitions builds a semi-simple net represented by 
(o:(si , . . . , Sji ^ , , . . . , I A \ , . . . , Aj () , where N^ — (sj , t j | Aj)j , 1 ^ f ^ n, 
and we assume without loss of generality that these configurations do not 
share variables. 



Example 5 (Non-termination). Consider the net {x, y \ a{x) = (3{a{y))) and the 
rule a{a) txi /3(/3(a(a))). The the following non-terminating reduction sequence is 
possible: {x,y \ a{x) = (3{a{y))) — > {x,y \ x = a, )3{a{a)) = a{y)) — > {a, y \ 
f3{a{a)) = a{y)) — > • • •. 

There is an obvious question to ask about this language with respect to the 
graphical formalism: can we write all interaction rules? Under some assumptions, 
the answer is yes. There are in fact two restrictions. The first one is that there is 
no way of writing a rule with an active pair in the right-hand side. This is not a 
problem since interaction nets can be assumed to satisfy the optimization condi- 
tion [12], which requires no active pairs in right-hand sides. The second problem 
is the representation of interaction rules for active pairs without interface. In the 
calculus, an active pair without interface can only rewrite to the empty net. In 
other words, it will be erased, and so can be ignored. This coincides with the 
operational semantics defined in [7] . 

3.1 A Type Discipline 

We now define a typed version of the calculus using type variables (pi,(p 2 , ■ ■ ■, and 
type constructors with fixed arities (such as int, bool, ... of arity 0; list,. . .of 
arity 1; x, 0 , >S>, . . .of arity 2; . . .). The terms built on this signature (r = 
list(int), list(list(bool)), 0((/3i, (/J 2 ), . • .) are the types of the system, and they are 
used-defined in the same way as the set E of agents is. Note that we may have 
type constructors with the same names as agents, as in the case of 0 which is 
traditionally used as a type constructor and as an agent in proof nets. 
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Types may be decorated with signs: will be called an output type, and 

T~ an input type. The dual of an atomic type (resp. cr~) is cr“ (resp. cr’*’), 
and in general we will denote by the dual of the type u, which is defined by 
a set of type equations of the form C{a\, . . . , an)'^ = . . . , such that 

(o"*')-*- = a. For example 



(list (cr))-*- = list^(cr-^) 

(list^(CT))-*- = list~(cr-*-) 

Moreover, we might have other equations defining equalities between types, 
such as: (cr X r) X p = cr X (r X p). 

Types will be assigned to the terms of the calculus by using the following 
inference rules, which are a form of one-sided sequents. 



Identity Group. The Axiom allows variables of dual types to be introduced to 
the system, and the Cut rule says that an equation is typeable (denoted t = u:o) 
if both sides are typeable with dual types. 



T 

x: cr, x\ (J 



r,t:a A,u:a^ 

(Cut) 

r, A,t = u: 0 



Structural Group. The Exchange rule allows permutations on the order of the 
sequent, and the Mix rule allows the combination of two sequents: 

r, t: a,u:r,A FA 

-(Exchange) -—(Mix) 

1 ,u: T,t: a, A I , A 



User-defined Group. For each agent a G S, there is a rule that specifies the way 
types are assigned to terms rooted by a. The general format is: 



Fi ,t\. 0\, ... ,t^,^. ... F}^ ,tf,,. (7iz , . . . , tif, . 

F\ , ... , 7A , Oc{t\, . . . ,tyi) . (T 



(Graft) 



The Graft rule for a specifies its partitions, that is, the way the subterms 
ti, . . . ,tn are distributed in the premises. For example, a set of Graft rules for the 
system defined in Example 2, Part 1 (arithmetic), together with a polymorphic 
erasing agent e defined by rules of the form e ixi a(e, . . . , e), can be defined as 
follows: 



F, t: F 

— ^ t(S) -{Z) 

F, S{t): int+ F,Z :int+ 



T, ti:int A,t2:int^ F 

— -{Add) (e) 

F, A, Add{ti,t 2 ):int F,e:a 



Definition 7 (Typeable Configurations). Let {xi,...,Xm} be the set of 

free names of t, then t is a term of type a if there is a derivation ending in 
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xi'.Ti, . . . ,Xm-Tm,t: a. Equations are typed in a similar way: if t = u is an 
equation with free names {x\, . . . ,Xm}, then it is typeahle if there is a deriva- 
tion for xi: Ti, . . . , Xm- Tm, t = u'.o. This notion can be extended to multisets of 
equations in a straightforward way. A configuration (t | si = Ui, . . . , Sm = Um) 
with free names x\, . . . ,Xp is typeable by cti, . . . , (T„, if there is a derivation for 
■ Pi 7 • ■ • 5 ■ Pp jt\. (7\ , . . . , tji . (Jyj , 5i lir . O, . . . , Sm Um - 



Example 6. The following is a type derivation for the net (a | Add{a, Z) = S{Z)) 
in Example 2, Part 1, using the set of Graft rules defined above. 



a: int^, a: int~ 



(Ax) 



Z\ int 



a: int^ , Add{a, Z ) : int 



{Z) 

{Add) 



Z\ int 



S(Z): int' 



(Z) 

(S) 



i: Add{a, Z) = S{Z): o 



(Cut) 



The typing of rules is more delicate, since we have to ensure that both sides 
are consistent with respect to types so that the application of the Interaction 
Rule preserves types (Subject Reduction). We do this in two steps: first we find a 
type derivation for the active pair (using arbitrary names for the free variables) , 
and then use this derivation as a template to build a derivation for the right-hand 
side. 



Definition 8 (Typeable Rules). Let E be a given set of agents, with a cor- 
responding set of Graft rules. We say that a rule a{ti , . . . , tn) to /3(si, . . . , Sm) 
is typeable if 

1. there is a derivation D with conclusion a{x\, . . . , Xn) = P{yi, ■ ■ ■ , Um)'- <> and 
leaves containing assumptions for x\, . . . ,Xn,yi, . ■ ■ , ym, built by application 
of the Cut rule and the Graft rules for a, (3, 

2. there is a type derivation with the same assumptions leading to the conclusion 

Xi — ti. o, . . . , Xji — tji. o, yi — si. o, . . . , — Sjn. o, 

3. and whenever an equation a{t {, . . . , t'^ = /3(s'i, . . . , s(„) is typeable, its type 
derivation is obtained by using the Cut rule and instances of the Graft rules 
for a, [3 applied in D. 



Example 7. The interaction rules for addition given in Example 2, Part 1, are 
typeable. We show the type derivation for Add{S{x),y) to S{Add{x, y)). First we 
build the most general derivation for the active pair (to ensure that condition 3 
in Definition 8 holds): 



xi:int X2:int'*' X3: int'*' 

-{Add) -(5) 

Add{xi,X 2 )'.int ^'(ais): int~^ 

(Cut) 

Add{xi,X 2 ) = S{x3): o 
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Then we use this template to build a second derivation: 

(Ax) 



x\ int , x: int"^ 



iS) 



y: int , y: int 



x: inf, 5 '(x): int^ 

X3: int'^ y. inf , ^(x): int'*', Add{x, y): int 
X2: int'*' y. inf , S'(x): int^, X3 = Add{x, y):o 



(Ax) 

(Add) 



(Cut) 



xi : int 



S{x): inf*", X2 = y. o, X3 = Add{x, y): o 



(Cut) 



xi = S'(x): o, X2 = y. o, X3 = Add(x, y): o 



(Cut) 



Consider now the encoding of proof nets (Example 3 ), with the interaction 
rule ®(a, 6) ixi >^{a,b). We will show that a cut-elimination step in linear logic 
proof nets is typeable. Here is the most general derivation for the active pair: 



Xi:CTl X2-<J2 
(g)(xi,X2): ®(cri,cr2) 



X 3 :crj*-,X 4 :CT^ 

'S>(x3,X4):>S>(cri^,cr^) 



(g)(xi, X2) = 'S>(X3, X4): o 

We build now the second derivation using this template: 



('S’) 

(Cut) 



Cl. cj^ ^ a. (T\ 



(Ax) 



b\ &: CT2 



(Ax) 



X3:crj‘-,X4:cr^ 



b: CT2 , X3: X4 = &: O 



(Cut) 



X2'Cr2 



a: , 6: <72 , X3 = a: o, X4 = 6: o 



(Cut) 



Xi:cTi 



a: , X2 = 6: o, X3 = a: o, X4 = fo: o 



(Cut) 



Xi = a: o, X2 = b: o, X3 = a: o, X4 = 6: o 



(Cut) 



The last condition in Definition 8 is crucial for Subject Reduction, as the 
following example shows. 

Example 8 (Untypeable Rule). Let E = {a. S'} where ar(o!) = ar(S) = 1 , to- 
gether with the interaction rule a{x) ixi S(x). The agents are typed with the 
Graft rules: 



r, t: int^ 



(S) 



r, t: a 



T, S(t): inf*' " " T, a(t):int 

We first build a type derivation for the active pair: 

ypint^ 



- (a) 



xi : int 
Q!(xi): int' 



(a) 



S(yi):int^ 



{S) 

(Cut) 



a(xi) = S(yi):o 
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With this template we can build a type derivation for the right-hand side: 



x\ x: int 



-(Ax) 



yi:int+ 



xi : int 



x: int"*". 



yi = x:o 



(Cut) 









But the equation a(x') = S{y') is also typeable by: 
xC bool~ uC int^ 

a(x'):int'^ S{y'):int^ 

a{x') = S{y'):o 



(S) 

(Cut) 



which is not an instance of the first derivation. This means that condition 3 
in Definition 8 is not satisfied, therefore the rule is not typeable. If we accept 
this rule, we obtain a system that is not sound with respect to types: the net 
( I a{True) = S{Z)) where True has type bool and Z int, is typeable, but 
reduces to ( | True = Z) which is not typeable. 



Theorem 1 (Subject Reduction). The computation rules Indirection, Inter- 
action, Context and Collect, preserve typeahility and types: For any set IZ of 
typeable rules and configuration c on S, if c c! then d can be assigned the 
same types as c. 

Proof. The cases of Indirection with Context and Collect are straightforward, 
we show the case of Interaction. Let 



c = (t I T, a{u) = P{v),r') d = (t I r,u = u',v = v' , T') 

using the interaction rule a{u') ix] f3{v'). Assume that c is typeable (more pre- 
cisely, there is a derivation for the sequent A,a{u) = f3{v):o corresponding 
to c), and that the rule a{u') ix] P{v') is typeable. We show that the sequent 

A,u = u': o,v = v': o corresponding to d is derivable. By Definition 8 (part 2), 

there is a proof tree for x = u' : o, y ^ v' : o using as template the proof tree of 
a{x) = (3{y)\o. By Definition 8 (part 3), the (a) and {(3) typing rules used in 
the proof of c are instances (say with a substitution S) of the ones used in the 
proof of a(x) = (3{y)\o. Hence, we can build the derivation tree for the sequent 
associated to d by using the instance (by substitution S) of the proof tree of 

x = u' \o,y = v'\ o, replacing Xi,yi by Ui,Vi, and replacing the leaves containing 
assumptions for x^, yi by the corresponding proofs of Ui,Vi (subtrees of the proof 
of c). □ 

We remark that the notion of partition is built-in in our type system: the 
partitions of an agent correspond to the hypotheses in its Graft rule. This means 
that our type system can be used to check semi-simplicity of nets. 
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Proposition 1. Typeable configurations are semi-simple (without vicious circles 
of principal ports). 

Proof. Induction on the type derivation. Note that the names given to the type 
rules of our system coincide with the names given to the operations that build 
semi-simple nets (cf. Example 4). □ 

Related systems. Danos and Regnier [5] generalized the multiplicative connec- 
tives of linear logic, showing the general format of the introduction rule (for a 
connective and its dual), and the cut-elimination step. Each connective corre- 
sponds to an agent and a type constructor in our system, and their rules coincide 
with our Graft rules. The cut-elimination steps are defined through interaction 
rules in our system (and are not necessarily rewirings, as in the case of multi- 
plicatives). 

Lafont [12] introduced a basic type discipline for the graphical framework of 
interaction nets, using a set of constant types (r S {atom, nat, list,. . .}). For each 
agent, ports are classified between input and output: input ports have types of 
the form r“, whereas output ports have types r+. A net is well-typed in Lafont ’s 
system if each agent is used with the correct type and input ports are connected 
to output ports of the same type. Our system is a generalization of Lafont ’s 
system (since we introduce polymorphism and a more general notion of duality) 
integrating the notion of partition. It is easy to see that the typed nets of Lafont 
are represented by configurations which are typeable in our calculus (the proof 
is by a straightforward induction on the structure of the configuration). 

In [6] another type system for interaction nets is discussed, using type vari- 
ables and intersection types. The intersection free part of this system is also 
a subsystem of ours, but we consider also polymorphic type constructors and 
build-in the notion of partition. 



3.2 Properties of the Calculus 

This section is devoted to showing various properties of the rewriting system 
defined by the rules Indirection, Interaction, Context, and Collect. These results 
are known for the graphical formalism of interaction nets, but here we show that 
the calculus, which is a decomposed system of interaction, also preserves these 
properties. 

Proposition 2 (Confluence). — > is strongly confluent. 

Proof. All the critical pairs are joinable in one step, using Lemma 1. □ 

We write c (1 c' iff c ^ c' 7A. As an immediate consequence of the previous 
property we obtain: 

Proposition 3 (Determinacy). c (1 c' and c {1 c" implies d = c". 
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It is a rather interesting phenomenon that interaction nets representing in- 
finite loops (infinite computations) can terminate. This is analogous to “black 
hole” detection, which is well-known in graph reduction for functional languages. 
Two examples of this are the following configurations: (y | a; = S{x,y)), where 
S is the duplicator agent, and ( | x = x), which both can be thought of as 
representing the cyclic term a = a in cyclic term rewriting [2]. Both of these 
configurations are irreducible. The latter is a net without an interface, and the 
first is the same thing with an interface. Hence our interaction net configurations 
allow us to distinguish these two cases. 

Proposition 4. The rules Indirection and Collect are terminating. 

Proof. Both rules decrease the number of equations in a configuration. □ 

Non-termination arises because of the Interaction rule (cf. Example 5) . There 
are several criteria for termination of nets, which were defined for the graphi- 
cal framework, see for example [3] and [8]. These can be recast in the textual 
language in a much cleaner, concise way. 



4 Normal Forms and Strategies 

Although we have stressed the fact that systems of interaction are strongly 
confluent, there are clearly many ways of obtaining the normal form (if one 
exists), and moreover there is scope for alternative kinds of normal form, for 
instance those induced by weak reduction. 

In this section we study several different notions of normal form and strategies 
for evaluation of interaction nets. The calculus provides a simple way of express- 
ing these concepts which are quite hard to formalize in the graphical framework. 
In addition, we define some extra rules that can optimize the computations. 

There are essentially two standard notions of normal form for rewriting sys- 
tems: full normal form and weak normal form (also known as root stable, weak 
head normal form, etc). In the A-calculus one also has notions such as head 
normal form, which allows reduction under the top constructor. 

These notions can be recast in our calculus, providing in this way a theory 
for the implementation of interaction nets. One of the most fruitful applica- 
tions of this would be the implementation of nets containing disconnected non- 
terminating computations, which are crucial for the coding of the A-calculus, 
and functional programming languages for instance. 

We begin with the weakest form of reduction that we will introduce. 

Definition 9 (Interface Normal Form). A configuration {TZ, (t | A)) is in 

interface normal form (INF) if each ti in t is of one of the following forms: 

— a{^. E.g. {S{x) \ x = Z,A). 

— X where x G JVftj), i yf j. This is called an open path. E.g. {x,x \ A) 

— X where x occurs in a cycle in A. E.g. {x \ y = a{j3{y) , x) , A) . 
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Clearly any net with no interfaee ( | Z\) js in interfaee normal form. 

Intuitively, an interaction net is in interface normal form when there are 
agents with principal ports on all of the observable interface, or, if there are ports 
in the interface that are not principal, then they will never become principal 
by reduction (since they are in an open path or a cycle). This idea can also 
be adapted to the typed framework, where we may require that only terms 
with positive types appear in the head of the configuration in interface normal 
form. This corresponds exactly to the Canonical Forms defined in [7], where 
negative ports are not observable. Additionally, this notion of normal form can 
be generalized to deal with a user defined set of values in the interface (Value 
Normal Form), which can allow some reduction under the top constructor. 

The second notion of normal form that we introduce is the strongest one in 
that all reductions possible have been done, and corresponds to the usual notion 
of normal form for interaction nets. 

Definition 10 (Full Normal Form). A configuration {TZ,{t \ A)) is in full 
normal form if either A is empty, or all equations in A are of the form x = t 
where x G t or x is free in (t | Z\) . 

Having defined the notions of normal form, we now give the corresponding 
algorithms to compute them. We suggest different ways of evaluating a net to 
normal form, each of which could be useful for various applications. 

Full Reduction. To obtain the full normal form of a net, we apply the computa- 
tion rules until we obtain an irreducible configuration. Since interaction nets are 
strongly confluent, there is clearly no need to impose a strategy, since all reduc- 
tions will eventually be done. However, there are additional factors, involving 
the size of the net and non-termination, that suggest that different strategies 
can be imposed: 

Priority Reduction. As many of the examples that we have already given indi- 
cate, there are interaction rules which reduce the size of the net, keep the size 
constant, and increase the size of the net. More formally, we define the size of a 
term: 

|x| = 0 

|o!(ti, . . . , tn) \ = 1 -k |tl| -k • ■ ■ -k |tri| 

The size of an equation is given by \t = u\ = |t| -k |m|. For an interaction rule 
r = a{i) [XI fS{u), let s = |t|-k | m|. Then r is said to be expansive if s > 2, reducing 
if s < 2, otherwise it is stable. We can then define an ordering on the rules, and 
give priority to the ones which are reducing. 

Connected reduction. Using this strategy, an application of Interaction to t = m 
in (t \ t = u, A) would only be allowed if M{t) n M{t = u) ^ Thus subnets 
that are not connected to the observable interface will not be evaluated. As a 
direct application of this strategy we can define an accelerated garbage collection. 
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Interaction nets are very good at capturing the explicit dynamics of garbage 
collection. However, nets have to be in normal form before they can be erased. 
Therefore, non-terminating nets can never be erased. With this strategy, we can 
ignore isolated nets since they will never contribute to the interface of the net. 
It is therefore useful to add an additional rule, called Cleanup, which explicitly 
removes these components, without reducing to normal form first. If there are 
no free names in the equation t = u, then 

{t\t = u,A) — > (t I Z\) 



Weak Reduction. Computing interface normal forms suggests that we do the 
minimum work required to bring principal ports to the interface; computing 
value normal form requires that each agent in the observable interface is a value. 
Both of these can be described by placing conditions on the rewrite system. Here 
we just focus on interface normal form. The following conditional version of the 
computation rules is enough to obtain this. Given a configuration of the form: 
{ti, . . . ,x, . . . ,t„ \ t = u, A), where x G JV{t = u), then any of the computation 
rules can be applied to t = u. This process is repeated until the Collect rule 
has brought agents to the interface, or the configuration is irreducible. We can 
formalize this as a set of evaluation rules, for instance one such way is: 

Axiom: 



cG INF 



c U- c 



Collect: 



A) ]}.C 

{ti,...,x,...,t„ \x = t,A) ij-c 



Indirection: if cc G Af{u) and y G Af{t, u = v) 



I u[t/x] = v,A) 

(ti , . . . , j/, . . . , I X = t, M = V, Z\) IJ. c 



Interaction: if x G N{a{t) = = (3{u') G TZ 



> > 

{Si, . . . ,X, . . . ,Sn I t = t',U = u',A) U- C 
(si, . ..,x,...,Sn I a{i) = /3{u),A) IJ. c 

Note the simplicity of this definition, compared with the definitions given in the 
graphical framework. 



5 Conclusions 

We have given a calculus for interaction nets, together with a type system, and 
its operational theory. This calculus provides a solid foundation for an imple- 
mentation of interaction nets. In particular, we can express strategies, and define 
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notions of weak reduction which are essential for actual implementations of in- 
teraction nets for realistic computations. 

The language has also been extended to attach a value to an agent, and 
allow interaction rules to use this value in the right-hand side of the rule. Such 
a system is analogous to the extension of the A-calculus with 6 rules, as done for 
instance with the language PCF. 

There are various directions for further study. The operational account has 
lead to the development of an Abstract Machine for interaction nets [17]. This 
could be used as a basis for formalizing an implementation and could also serve 
as the basis for the development of a parallel abstract machine. 
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Abstract. Curry is a multi-paradigm declarative language covering 
functional, logic, and concurrent programming paradigms. Curry’s op- 
erational semantics is based on lazy reduction of expressions extended 
by a possibly non-deterministic binding of free variables occurring in 
expressions. Moreover, constraints can be executed concurrently which 
provides for concurrent computation threads that are synchronized on 
logical variables. In this paper, we extend Curry’s basic computational 
model by a few primitives to support distributed applications where a 
dynamically changing number of different program units must be coordi- 
nated. We develop these primitives as a special case of the existing basic 
model so that the new primitives interact smoothly with the existing 
features for search and concurrent computations. Moreover, programs 
with local concurrency can be easily transformed into distributed appli- 
cations. This supports a simple development of distributed systems that 
are executable on local networks as well as on the Internet. In particu- 
lar, sending partially instantiated messages containing logical variables 
is quite useful to implement reply messages. We demonstrate the power 
of these primitives by various programming examples. 



1 Introduction 

Curry [9,13] is a multi-paradigm declarative language which integrates func- 
tional, logic, and concurrent programming paradigms. Curry combines in a seam- 
less way features from functional programming (nested expressions, lazy evalua- 
tion, higher-order functions), logic programming (logical variables, partial data 
structures, built-in search), and concurrent programming (concurrent evalua- 
tion of expressions with synchronization on logical variables). Moreover, Curry 
provides additional features in comparison to the pure paradigms (compared to 
functional programming: search, computing with partial information; compared 
to logic programming: more efficient evaluation due to the deterministic and 
demand-driven evaluation of functions) and amalgamates the most important 
operational principles developed in the area of integrated functional logic lan- 
guages: “residuation” and “narrowing” (see [7] for a survey on functional logic 
programming) . 
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G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 188-205, 1999. 
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Curry’s operational semantics is based on a single computation model, firstly 
described in [9], which combines lazy reduction of expressions with a possibly 
non-deterministic binding of free variables occurring in expressions. Thus, purely 
functional programming and purely logic programming are obtained as particular 
restrictions of this model. Moreover, impure features of Prolog (e.g., arithmetic, 
cut, I/O) are avoided and don’t know non-deterministic computations can be 
encapsulated and controlled by the programmer [12]. For concurrent computa- 
tions, the evaluation of functions can be suspended depending on the instantia- 
tion of arguments, and constraints can be executed concurrently. This provides 
an easy modeling of concurrent objects as functions synchronizing on a stream 
of messages. Based on this computation model, we propose to add a new kind of 
constraint to relate a multiset of incoming messages with a list containing these 
messages. Such port constraints have been proposed in the context of concur- 
rent logic programming [16] for the local communication between objects. We 
generalize and embed them into the functional logic language Curry to obtain a 
simple but powerful mechanism to implement distributed applications that are 
executable on a network with an unknown number of communication partners. 

The paper is structured as follows. In the next section, we review the ba- 
sics of the operational model of Curry. We introduce and discuss the necessary 
extensions of this model to support distributed applications in Section 3. We 
demonstrate the use of these features by several examples in Section 4. Section 5 
discusses some implementation issues and Section 6 relates our approach to other 
existing proposals before we conclude in Section 7. 

2 Operational Semantics of Curry 

In this section, we sketch the basic computation model of Curry. More details 
and a formal definition can be found in [9,13]. 

From a syntactic point of view, a Curry program is a functional program^ 
extended by the possible inclusion of free (logical) variables in conditions and 
right-hand sides of defining rules. Thus, the basic computational domain of Curry 
consists of data terms, constructed from constants and data constructors, whose 
structure is specified by a set of data type declarations like 

data Bool = True I False 
data List a = [] la: List a 

True and False are the Boolean constants and [] (empty list) and : (non- 
empty list) are the constructors for polymorphic lists (a is a type variable and 
the type List a is usually written as [a] for conformity with Haskell). Then, a 
data term is a well-formed expression containing variables, constants, and data 
constructors, e.g.. True: [] or [x,y] (the latter stands for x: (y : [] )). 

^ Curry has a Haskell- like syntax [20], i.e., (type) variables and function names start 
with lowercase letters and the names of type and data constructors start with an 
uppercase letter. Moreover, the application of / to e is denoted by juxtaposition 
(“/ e”). 
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Functions are operations on data terms whose meaning is specified by (con- 
ditional) rules of the general form “I \ c =r where vs free” where I has the 
form f t\ . . .tn with / being a function, ti, . . . , data terms and each variable 
occurs only once, the condition c is a constraint, r is a well-formed expression 
which may also contain function calls, and vs is the list of free variables that 
occur in c and r but not in I (the condition and the where parts can be omitted 
if c and vs are empty, respectively) . A constraint is any expression of the built-in 
type Constraint where primitive constraints are equations of the form ci = : = 62 - 
A conditional rule can be applied if its condition is satisfiable. A Curry program 
is a set of data type declarations and rules. 

Example 1. Assume that the above data type declarations are given. Then the 
following rules define the concatenation of lists, the last element of a list, and 
a constraint which is satisfied if the first list argument is a prefix of the second 
list argument: 

cone [] ys = ys 

cone (x:xs) ys = x : cone xs ys 

last xs I coneys [x] =:=xs = x where x,ys free 

prefix ps xs = let ys free in cone ps ys =:= xs 

If the equation “cone ys [x] = : = xs” is solvable, then x is the last element of 
the list xs. Similarly, ps is a prefix of xs if the equation “cone ps ys =:= xs” 
is solvable for some value ys (note that existentially quantified variables vs can 
be introduced in a constraint c by let vs free in c). 

Functional programming: In functional languages, the interest is in computing 
values of expressions, where a value does not contain function symbols (i.e., it 
is a data term) and should be equivalent (w.r.t. the program rules) to the initial 
expression. The value can be computed by applying rules from left to right. 
For instance, we compute the value of “cone [1] [2]” by applying the rules for 
concatenation to this expression: 

cone [ 1 ] [ 2 ] ^ 1 : (cone [] [ 2 ]) [ 1 , 2 ] 

To support computations with infinite data structures and a modular program- 
ming style by separating control aspects [14], Curry is based on a lazy (out- 
ermost) strategy, i.e., the selected function call in each reduction step is an 
outermost one among all reducible function calls. This strategy yields an opti- 
mal evaluation strategy [1] and a demand-driven search method [10] for the logic 
programming part that will be discussed next. 

Logic programming: In logic languages, expressions (or constraints) may contain 
free variables. A logic programming system should compute solutions, i.e., find 
values for these variables such that the expression (or constraint) is reducible to 
some value (or satisfiable) . Fortunately, it requires only a slight extension of the 
lazy reduction strategy to cover non-ground expressions and variable instantia- 
tion: if the value of a free variable is demanded by the left-hand sides of program 
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rules in order to proceed the computation (i.e., no program rule is applicable if 
the variable remains unbound), the variable is non-deterministically bound to 
the different demanded values. For instance, if the function f is defined by the 
rules 

f 0 = 2 

f 1 = 3 

(the integer numbers are considered as an infinite set of constants), then the ex- 
pression “f x” with the free variable x is evaluated to 2 by binding x to 0, or it 
is evaluated to 3 by binding x to 1. Thus, a single computation step may yield a 
single new expression {deterministic step) or a disjunction of new expressions to- 
gether with the corresponding bindings {non- deterministic step). For inductively 
sequential programs (these are, roughly speaking, function definitions without 
overlapping left-hand sides), this strategy, called needed narrowing [1], computes 
the shortest possible successful derivations (if common subterms are shared, as 
usual in implementations of lazy languages) and a minimal set of solutions, and 
it is fully deterministic if free variables do not occur. 



Encapsulated search: Since functions in Curry have no side effects, the strat- 
egy to handle non-deterministic computations is not fixed in Curry (in contrast 
to Prolog which fixes a backtracking strategy). To provide flexible application- 
oriented search strategies and to avoid global backtracking like in Prolog which 
causes problems when integrated with I/O and concurrent computations, don’t 
know non-deterministic computations can be encapsulated and controlled by the 
programmer [12]. For this purpose, a search goal is a lambda abstraction \x->c 
where c is the constraint to be solved and x is the search variable occurring 
in c for which solutions should be computed. Based on a single language prim- 
itive to control non-deterministic computation steps, various search strategies 
can be defined (see [12] for details). For instance, findall computes the list 
of all solutions for a search goal with a depth-first strategy, i.e., the expression 
“findall \ps->prefix ps [1,2]” reduces to the list [[] , [1] , [1,2]] (w.r.t. 
the program in Example 1). 

An important point in the treatment of encapsulated search is that (i) the 
search has only local effects and (ii) non-deterministic steps are only performed if 
they are unavoidable. To satisfy requirement (i), “global” variables (i.e., variables 
that are visible outside the search goal) are never bound in local search steps. To 
satisfy requirement (ii), a possible non-deterministic step in a search goal is sus- 
pended if the search goal contains a global variable (since binding this variable 
outside the search goal might make this step deterministic) or another deter- 
ministic step is possible. This corresponds to the stability requirement in AKL 
[15]. In the context of this paper, the important point is that non-deterministic 
steps are not performed if the search goal has a reference to some global vari- 
able. Since we shall model the coordination of distributed activities by partially 
instantiated global variables, non-deterministic steps are automatically avoided 
if they refer to global communication channels. 
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Constraints: In functional logic programs, it is necessary to solve equations be- 
tween expressions containing defined functions (see Example 1). In general, an 
equation or equational constraint ei=:=e 2 is satisfied if both sides e\ and 62 are 
reducible to the same value (data term) . As a consequence, if both sides are un- 
defined (non-terminating), then the equality does not hold (strict equality [5]). 
Operationally, an equational constraint ei=:=e 2 is solved by evaluating e\ and 
62 to unifiable data terms where the lazy evaluation of the expressions is inter- 
leaved with the binding of variables to constructor terms. Thus, an equational 
constraint ei =:=62 without occurrences of defined functions has the same mean- 
ing (unification) as in Prolog. The basic kernel of Curry only provides equa- 
tional constraints. Since it is conceptually fairly easy to add other constraint 
structures, extensions of Curry can provide richer constraint systems to support 
constraint logic programming applications. In this paper, we add one special kind 
of constraint ( “port constraint” , see Section 3) to enable the efficient sending of 
messages from different clients to a server. 

Concurrent computations: To support flexible computation rules and avoid an 
uncontrolled instantiation of free argument variables, Curry provides the suspen- 
sion of a function call if a demanded argument is not instantiated. Such functions 
are called rigid in contrast to flexible functions which instantiate their arguments 
if it is necessary to proceed their evaluation. As a default in Curry (which can 
be easily changed), constraints (i.e., functions with result type Constraint) 
are flexible and non-constraint functions are rigid. Thus, purely logic programs 
(where predicates correspond to constraints) behave as in Prolog, and purely 
functional programs are executed as in lazy functional languages like Haskell. 

To continue computations in the presence of suspended function calls, con- 
straints can be combined with the concurrent conjunction operator &, i.e., c\ &C 2 
is a constraint which is evaluated by solving c\ and 62 concurrently. There is also 
a sequential conjunction operator &>, i.e., the expression c\ &>C 2 is evaluated by 
first evaluating c\ and then 62 . 

A design principle of Curry is the clear separation of sequential and concur- 
rent activities. Sequential computations, which form the basic units of a program, 
can be expressed as usual functional (logic) programs, and they are composed to 
concurrent computation units via concurrent conjunctions of constraints. This 
separation supports the use of efficient and optimal evaluation strategies for 
the sequential parts, where similar techniques for the concurrent parts are not 
available. This is in contrast to other, more fine-grained concurrent computa- 
tion models like AKL [15], CCP [22], or Oz [25]. In this paper, we extend the 
basic concurrent computation model to support distributed applications where 
different (external) clients interact. 

Monadic I/O: Since the communication with external programs require some 
knowledge about performing I/O declaratively, we assume familiarity with the 
monadic I/O concept of Haskell [20,27] which is also used in Curry. Due to lack 
of space, we cannot describe it here in detail but it is sufficient to remember that 
I/O actions are sequentially composed by the operators >>= and >>, putStrLn 
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is an action that prints its string argument to the output stream, and done is 
the empty action. Since disjunctive I/O actions as a result of a program are not 
reasonable, all possible search must be encapsulated between I/O operations, 
otherwise the entire program suspends. 

3 Prom Concurrent to Distributed Computations 

This section motivates the primitives which we add to Curry to support dis- 
tributed applications. Since these primitives should smoothly interact with the 
basic computation model, in particular encapsulated search and local concurrent 
computations, we introduce them as a specialization of the existing features for 
concurrent object-oriented programming. 

It is well known from concurrent logic programming [24] that (concurrent) 
objects can be easily implemented as predicates processing a stream of incoming 
messages. The internal state of the object is a parameter which may change in 
recursive calls when a message is processed. For instance, a counter object which 
understands the messages Set v, Inc, and Get v can be implemented in Curry 
as follows (the predefined type Int denotes the type of all integer values and 
success denotes the always satisfiable constraint): 

data CounterMessage = Set Int I Inc I Get Int 
counter eval rigid 

counter _ (Set v : ms) = counter v ms 

counter n (Inc : ms) = counter (n+1) ms 

counter n (Get v : ms) = v=:=n & counter n ms 

counter _ [] = success 

The evaluation annotation “counter eval rigid” marks counter as a rigid 
function, i.e., an expression “counter n s” can reduce only if s is a bound vari- 
able. The first argument of counter is the current value of the counter and the 
second argument is the stream of messages. Thus, the evaluation of the con- 
straint “counter 0 s” creates a new counter object with initial value 0 where 
messages are sent by instantiating the variable s. The final rule terminates the 
object if the stream of incoming messages is finished. For instance, the constraint 



let s free in counter 0 s & s=: = [Set41, Inc, Get x] 

is successfully evaluated by binding x to the value 42. Although the stream 
variable s is instantiated at once to all messages in this simple example, it should 
be clear that messages can be individually sent by incrementally instantiating s. 

If there is more than one process sending messages to the same counter object, 
it is necessary to merge the message streams from the different processes into 
a single message stream (otherwise, the processes must coordinate themselves 
for message sending) . Since the processes work concurrently, the stream merger 
must be fair. A fair merger can be implemented in Curry as follows: 
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merge eval choice 

merge (x:xs) ys = x : merge xs ys 

merge xs (y:ys) = y : merge xs ys 

merge [] ys = ys 

merge xs [] = xs 

The evaluation annotation choice has the effect that at most one rule is applied 
to a call to merge even if there is another applicable rule (where all alternatives 
are evaluated in a fair manner), i.e., this corresponds to a committed choice in 
concurrent logic languages. Although a committed choice restricts the declar- 
ative reading of programs and destroys the completeness results for the basic 
operational semantics [9], such or a similar construct is usually introduced to 
program reactive systems. Using the indeterministic merge function, we can cre- 
ate a counter that accepts messages from different clients: 

counter 0 (merge si s2) & clientl si & client2 s2 

If we want to access the counter object from n different clients, it is immediate 
to use n — 1 mergers to combine the different message streams into a single 
one. It has been argued [16] that this causes a significant overhead due to the 
forwarding of incoming messages through the mergers. Moreover, this solution 
causes difficulties if the number of clients can change dynamically as in many 
distributed applications. Therefore, Janson et al. [16] proposed the use of ports 
to solve these problems. Ports provide a constant time message merging w.r.t. 
an arbitrary number of senders and a convenient way to dynamically extend the 
number of senders. Therefore, we also propose an extension of the base language 
by ports but embed this concept into concurrent functional logic programming 
(where Janson et al. proposed ports for the concurrent logic language AKL) and 
extend it to communication with external partners. 

In principle, a port is a constraint between a multiset and a list that is satisfied 
if all elements in the multiset occur in the list and vice versa. A port is created by 
evaluating the constraint “openPort p s” where p and s are uninstantiated free 
variables, p and s will later be constrained to the multiset and list of elements, 
respectively. Since sending messages is done through p, p is often identified with 
the port and s is the stream of incoming messages. “Port a” denotes the type 
of a port to which messages of type a can be sent, i.e., openPort has the type 
definition 

openPort : : Port a -> [a] -> Constraint 

A message is sent to the port by evaluating the constraint “send m p” which 
constrains (in constant time) p and the corresponding stream s to hold the 
element m. From a logic programming point of view, the stream s has always an 
uninstantiated variable s_tail at the end and evaluating the send constraint 
means evaluating the constraint 

let s_taill free in s_tail =:= (m : s_taill) 

Thus, the new message is appended at the end of the stream by instantiating 
the current open end of the stream. Since the instantiation is done by solving 
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a strict equation (compare Section 2), it is also evident that the message m is 
evaluated before sending it (“strict communication”, like in Eden [3]). If the 
communication were lazy, the lazy evaluation of messages at the receiver’s side 
would cause a communication overhead. 

Using ports, we can rewrite our counter example with two clients as 

openPort p s &> counter 0 s & client 1 p & client2 p 

Thus, the code for the object remains unchanged but we have to replace the 
instantiation of the streams in the clients by calls to the send constraint. 

This approach to communication between different processes has remarkable 
consequences: 

— It has a logical reading, i.e., communication is not done by predicates or 
functions with side effects (like, e.g., the socket library of Sicstus-Prolog) but 
can be described as instantiation of logical variables and constraint solving. 
Thus, the operational semantics of our communication primitives is a simple 
extension of the operational semantics of the base language. 

— It interacts smoothly with the operational principles of the base language. 
For instance, local search and non-deterministic computations are only pos- 
sible if the search goal contains no reference to global variables (compare 
Section 2). Thus, it is impossible to send messages to global ports inside lo- 
cal search computations or to split a server object into two non-deterministic 
computation threads. This is perfectly intended, since backtracking on 
network-oriented applications or copying server processes to interact 
with non-deterministic clients is difficult to implement. 

— It provides an efficient implementation since message sending can be imple- 
mented without forwarding through several mergers and the senders have no 
reference to old messages, i.e., the multiset of the port must not be explicitly 
stored. 

— Partially instantiated messages containing free variables (e.g., message 
“Get x” ) provide an elegant approach to return values to the sender without 
explicitly creating reply channels. 

— The number of senders can be dynamically extended — every process which 
gets access to the port reference (the multiset variable) can send messages 
to the port. This property can be exploited in many distributed applications 
(see below). 

Up to now, we can use ports only inside one program (similarly to [16]) but for 
many distributed applications (like Internet servers) it is necessary to commu- 
nicate between different programs. Therefore, we introduce two operations to 
create and connect to external ports, i.e., ports that are accessible from outside. 
Since the connections of ports to the outside world changes the environment of 
the program, these operations are introduced as I/O actions (see Section 2). 

The I/O action “openNamedPort n” creates a new external port with name 
n and returns the stream of incoming messages. If this action is executed on 
machine m (where m is a symbolic Internet name), the port (but not the stream 
of incoming messages) is now globally accessible as “nOm” by other applications. 
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On the client side, the I/O action “connectPortpn” returns the external port 
which has the symbolic name pn so that clients can send messages to this port. 
For instance, to create a globally accessible “counter server” , we add the following 
definitions to our counter: 

main = openNeunedPort "counter" >>= counter_server 
counter_server s I counter 0 s = done 

If we execute main on the machine medoc.cs.rwth.de, we can implement a 
client by 

client port_nEmie msg = connectPort port_name >>= sendPort msg 
sendPort msg p I send msg p = done 
and increment the global counter by evaluating 
client "counter@medoc.cs.rwth.de" Inc 

Before we present some more interesting examples, we introduce a final primitive 
which has no declarative meaning but is useful in real distributed applications. 
Since the communication over networks is unsafe and a selected server could 
be down or may not respond in a given period of time, one want to take an- 
other action (for instance, choosing a different server or inform the user) if this 
happens. Therefore, we introduce a temporal constraint “after t” which is sat- 
isfied t milliseconds after the constraint has been checked for the first time, i.e., 
“after 0” is immediately satisfied. Typically, this temporal constraint is used 
as an alternative in a committed choice like in 

getAnswer eval choice 
get Answer (msg:_) = msg 

getAnswer _ I after 5000 = <take an alternative action> 

For instance, if getAnswer is called with a stream of a port as an argument, 
it returns the first message if it is received within five seconds, otherwise an 
alternative action is taken. 

The following type definitions summarizes the proposed new primitives to 
support the development of distributed applications: 

— open an internal port for messages of type "a": 
openPort : : Port a -> [a] -> Constraint 

send : : a -> Port a -> Constraint — send message to port 

— open a new external port, return stream of messages: 
openNamedPort : : String -> ID [a] 

— connect to external port, return port for sending messages: 
connectPort : : String -> 10 (Port a) 

after : : Int -> Constraint — timeout 
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4 Examples 

In this section, we demonstrate the use of the primitives for distributed appli- 
cations introduced in the previous section. In order to avoid presenting all the 
tedious details of such applications, we have simplified the examples so that we 
concentrate on the communication structures. 



4.1 A Name Server 

The first example represents a class of client/server applications where the server 
holds some database which is requested by the clients. For the sake of simplicity, 
we consider a simple name server which stores an assignment from symbolic 
names to numbers. It understands the messages “PutName n z” to store the name 
n with number i and “GetNamenz” to retrieve the number z associated to the 
name n. The name server is implemented as a function which has the assignment 
from names to numbers as the first argument (function n2i below) and the 
incoming messages as the second argument (initially, 0 is assigned to all names 
by the lambda abstraction \_->0): 

nameserver = openNamedPort "nameserver " >>= ns_loop \_->0 

ns_loop n2i (GetNamie n i : ms) I i=:=(n2i n) = ns_loop n2i ms 
ns_loop n2i (PutNaune n i : ms) = ns_loop new_n2i ms 
where new_n2i m = if m==n then i else n2i m 

In the first rule of ns_loop, the (usually uninstantiated) variable i is instantiated 
with the number assigned to the name n by solving the equational constraint in 
the condition. In the second rule, a modified assignment map new_n2i is passed 
to the recursive call. If we evaluate nameserver on the machine 
medoc.cs.rwth.de, then we can add the assignment of the name talk to the 
number 42 by evaluating 

client "ncuneserver@medoc.cs.rwth.de" (PutName "talk" 42) 

on some machine connected to the Internet (where client was defined in Sec- 
tion 3). After this assignment, the evaluation of 

client "ncuneserver@medoc.cs.rwth.de" (GetName "talk" x) 

binds the free variable x to the value 42. 

Note that the sending of messages containing free variables is an elegant way 
to return values to the sender. Here we exploit the fact that the base language 
is an integrated functional logic language which can deal with logical variables. 
Functional languages extended for distributed programming like Eden [3], Er- 
lang [2], or Goffin [4] require the explicit creation or sending of reply channels. 

An extension of our name server should demonstrate the advantages of using 
logical variables in messages. Consider a hierarchical name server organization: 
if the local name server has no entry for the requested name (i.e., the assigned 
number is the initial value 0), it forwards this request to another name server. 
This can be easily expressed by changing the first rule of ns_loop to 
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ns_loop n2i (GetName n i : ms) 

I if (n2in)==0 then send (GetName n i) master else i=: = (n2in) 

= ns_loop n2i ms 

It is assumed that master is the port of the other name server to which the 
request is forwarded. Note that the local name server can immediately proceed 
its service after forwarding the request to the master server and need not to wait 
for the answer from the master since the master becomes responsible for binding 
the free variable in the GetName message. 

If the requested name server is down so that no answer is returned, one would 
like to inform the user about this fact instead of an infinite waiting. This can be 
easily implemented with a temporal constraint by the following function: 

showAnswer eval choice 

showAnswer ans I ans==ans = show ans 

showAnswer _ I after 10000 = "No answer from name server" 

“ti==t2” denotes strict equality on ground data terms like in Haskell, i.e., 
if t\ or t2 reduces to a data term containing an uninstantiated variable, the 
evaluation of this equality is suspended until the variable has been bound to 
some ground data term. Thus, “showAnswer a;” yields a string representation 
of the value of x if it evaluates to a ground data term or it yields the string 
"No answer from name server" if x has not been bound to a ground term 
within ten seconds. Thus, the evaluation of 

client "nauneserverOmedoc. cs.rwth.de" (GetName "talk" x) 

>> putStrLn (showAnswer x) 

prints the value assigned to talk or the required timeout message. 



4.2 Talk 

The next example shows a distributed application between two partners where 
both of them act as a server as well as a client. The application is a simplification 
of the well known Unix “talk” program. Here we consider only the main talk 
activity (and not the calling of the partner via the talk daemon) where each 
partner program must do the following (we assume that each partner has an 
external talk port with symbolic name talk to receive the messages from the 
partner): 

— If the user inputs a line on the keyboard (which is transmitted through the 
port with symbolic name stdin), this line is sent to the talk port of the 
partner. 

— If the program receives a line from the partner through its own talk port, this 
line (preceded by ‘*’) is shown at the screen by the I/O action putStrLn. 

Since the sequence of both events is not known in advance, the standard input 
port as well as the talk port must be examined in parallel. For this purpose, we 
use a committed choice. Thus, the talk program consists of a loop function tloop 
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Fig. 1. Communication structure of the talk program 



which has three arguments: the talk port of the partner, the stream connected 
to the own standard input, and the stream connected to the own talk port: 

tloop eval choice 

tloop your tty (m:ms) = putStrLn (’*’:m) >> tloop your tty ms 
tloop your (m:ms) my = sendPort m your >> tloop your ms my 

The tloop is activated by the following main program:^ 

talk your_portname = do my_port <- openNamedPort "talk" 
tty_port <- openNamedPort "stdin" 
your_port <- connectPort your_portname 
tloop your_port tty_port my_port 

If a user on machine ml wants to talk with the user on machine m2, they must 
evaluate 

on machine ml: talk "talk@m2" 
on machine m2: talk "talkOml" 

The communication structure created by these calls is shown in Fig. 1. 



4.3 A Computation Server 

Since our communication through ports is strict, i.e., messages are evaluated 
before sending them (cf. Section 3), there is no direct way to distribute compu- 
tational work like remote procedure calls (RPCs) where procedures are evaluated 
at some other node in the network. Although port communication corresponds 
to message passing, we can easily implement RPCs using the higher-order fea- 
tures of the base language. For instance, a computation server, i.e., a process 
running on some node in the network offering to execute some work by evaluat- 
ing functions, can be implemented as a function accepting messages containing 
triples (f ,x,y) where / is a function to be applied to the actual argument x 

^ Here we make use of Haskell’s do notation [20] where “do pi<-ei ; . . . ,Pn<~en ; e’’ is 
syntactic sugar for “ei »= \pi->...6n »= \pn->e" . 
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and y is a free variable which is instantiated with the result of / x. Thus, the 
entire computation server can be implemented as follows: 

start_compserver = openNamedPort "compserver" >>= compserver 
compserver ((f,x,y) : ms) I y=:=(f x) = compserver ms 

If prime is a function to compute the n-th prime number, we can use this com- 
putation server to compute prime numbers, e.g., the execution of 

client "compserverOcs" (prime , 1000 ,p) 

binds the free variable p to the 1000th prime number where the computation is 
performed on the node cs where the server has been started. This remarkable 
simple implementation needs some comments. 

1. In Section 2, we introduced the constraint =:= as equality on data terms 
and, thus, it might be unclear how we can send functional objects in mes- 
sages. For this purpose, we consider partially applied functions, i.e., functions 
where the number of actual arguments is less than their arity, as data terms 
since they are not evaluable. This is conform with standard methods to add 
higher-order features to logic programming [28] and theoretically justified 
for lazy functional logic languages in [6] . As a consequence, an equation like 
“x= : =prime” is solved by binding the variable x to the function name prime. 
Since partially applied function calls are considered as data terms, the code 
implementing the function is not immediately sent in the above message but 
it will be transferred from the client to the server when the server evaluates 
it (dynamic code fetching). 

2. The RPC is asynchronously performed since the client sends its request 
without explicitly waiting for the answer. The client can proceed with other 
computations as long as it does not need the result of this call which is 
passed back through the third argument of the message. Thus, the free result 
variable is similar to a “promise” which has been proposed by Liskov and 
Shrira [17] to overcome the disadvantages of synchronous RPCs. A promise 
is a special place holder for a future return value from an RPC. Since we 
can use the logic part of the base language for this purpose, no linguistic 
extension is necessary to implement asynchronous RPCs. 

3. The attentive reader might raise the question what happens if the execution 
of the transmitted function causes a non-deterministic computation step. 
Does the server split into two disjunctive branches? This does not happen 
since, as mentioned at the end of Section 2, non-deterministic steps between 
I/O actions are suspended. One method to avoid this suspension is to return 
only the first solution to the sender. This can be done by encapsulating the 
search, i.e., we could replace the constraint “y=: = (f x)” by the expression 

y =:= head (findall \z -> z=:=(f x)) 

A disadvantage of the above computation server is the fact that the complete 
server is blocked if the evaluation of a single RPC is suspended or takes a long 
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time. Fortunately, it is very simple to provide a concurrent version of this server 
using the concurrency features of the base language. For this purpose, we turn 
the server function into a constraint and evaluate the RPC in parallel to the 
main server process: 

start_compserver = openNamedPort "compserver" >>= serve 
where serve ms I compserver ms = done 

compserver eval rigid^ 

compserver ((f,x,y) : ms) = y=:=(f x) & compserver ms 



4.4 Encrypting Messages 

To support more security during message sending, messages should be encrypted 
before sending. For this purpose, public key methods are often used. The idea 
of public key methods is to encode a message with a key before sending and to 
decode the message with another key after receiving. Both keys must be chosen 
in a way so that decoding the encoded message gives the original message back. 
Since the coding algorithm as well as one key are publicly known, it is essential 
for the security of the method to choose keys that are large enough. 

In the following, we use a similar idea but functions instead of keys, i.e., the 
encoding algorithm as well as the key is put into a single function. Thus, one has 
to choose a public encrypt function e and a private decrypt function d so that 
d{e{m)) = m for all messages m (the additional property e{d{m)) = m would 
be necessary for authentication). 

As a simple example, we show a server which processes requests and returns 
the answers encrypted. The public encrypt function is sent together with the 
message. This has the advantage that for each message and client, another en- 
cryption can be used. Since there are a huge number of encrypt/decrypt function 
pairs, the functions could be relatively simple without sacrificing security. Sim- 
ilarly to the computation server, this server receives triples (e,rq,rs) where e 
is the public encrypt function, rq is the request to the server and rs will be 
instantiated to the encrypted result (the unspecified function computeanswer 
determines the main activity of the server): 

start_crypticserver = openNamedPort "cryptserver" >>= cserver 

cserver ((encode, rq,rs) : ms) I rs =:= encode (computeanswer rq) 

= cserver ms 

For strings, i.e., lists of characters, the pair rev/rev (list reversing) is a simple 
encrypt/decrypt pair. Thus, we can send a request to the server and decode the 
answer by 

client "cryptserverOcs" (rev, "Question. .." ,y) >> show (rev y) 

Although this example is simplified, it should be obvious that further features 
like authentication can be easily added. 

The rigid annotation is necessary since constraints are flexible by default in Curry. 
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5 Implementation 

The full implementation of the presented concepts is ongoing. We have tested 
the examples in this paper with a prototypical implementation of Curry based 
on Sicstus-Prolog. In this implementation, we used the socket library of Sicstus- 
Prolog to implement the port communication via sockets. Free variables sent 
in messages are implemented by dynamic reply channels on which the receiver 
sends the value back if the variable is instantiated. 

Currently, we are working on a more efficient implementation based on the 
compiler from Curry into Java described in [11]. In this implementation, we use 
the distribution facilities of Java to implement our communication model. In 
particular, we use Java’s RMI model to implement ports. Sending a message 
amounts to binding a free variable (the stream connected to the port) by a 
method call on the remote machine. Free variables sent in messages reside in the 
sender’s computation space and if the receiver binds this variable, he calls a re- 
mote method on the sender’s machine to bind this variable. The implementation 
of functional objects sent in messages is more advanced. It could be implemented 
by sending a reference to the code that implements this function. If the function 
is applied and evaluated by the receiver, the function code is dynamically loaded 
from the sender to the receiver (dynamic code fetching). 

6 Related Work 

Since features for concurrent and distributed programming become important 
for many applications, there are a various approaches to extend functional or 
logic programming languages with such features. In the following, we relate our 
proposal to some of the existing ones. 

Initiated by Japan’s fifth generation project, various approaches to add con- 
currency features to logic programming [23] have been proposed culminating in 
Saraswat’s framework for concurrent constraint programming [22]. Usually, these 
approaches consider only concurrency inside an application but provide no fea- 
tures for connecting different programs to a distributed system. The concurrent 
logic language AKL [15] also supports only concurrency inside a program but 
proposed ports [16] for the efficient communication between objects. Ports have 
been also adapted to Oz [25] where it has been also embedded into a framework 
for distributing the computational activities over a network [26]. In contrast to 
our approach, ports are not a primitive constraint but are implemented by the 
stateful features of Oz. All these languages are strict (and untyped) while our 
proposal combines optimal lazy reduction for the sequential computation parts 
with strict communication between the distributed and concurrent entities. 

Concurrent Haskell [21] extends the lazy functional language Haskell by 
methods to start processes inside an application and synchronize them with 
mutable variables, but facilities for distribution are not provided. Closest to our 
approach w.r.t. the communication features are Erlang [2] and an extension of 
Coffin [4]. Erlang is a concurrent functional language developed for telecom- 
munication applications. Processes in Erlang can communicate over a network 
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via symbolic names which provides for communication between different appli- 
cations. In contrast to our proposal, Erlang is a strict and untyped language 
and provides no features for logic programming. Thus, partial messages can not 
be sent so that explicit reply channels (process identifiers) must be included in 
messages where answers should be sent back. The extension of Goffin described 
in [4] extends a lazy typed concurrent functional language by a port model for 
internal and external communication. Although it uses logical variables for syn- 
chronization, it does not provide typical logic programming features like search 
for solutions. Differently to our proposal for communication, partial messages 
including logical variables are not supported, the creation of connections to ex- 
ternal ports is not integrated in the I/O monad (and, hence, I/O operations 
like reading/writing files can not be used in a distributed program) and, once 
a port is made public on the network, every node can not only send messages 
to this port but can also read all messages incoming at this port. The latter 
property may cause security problems for many distributed applications. This is 
avoided in our proposal by allowing only one server process to read the incoming 
messages at an external port. 



7 Conclusions 



We have proposed an extension of the concurrent functional logic language Curry 
that supports a simple implementation of distributed applications. This exten- 
sion is based on communication via ports. The important point is that the 
meaning of port communication can be described in terms of computation with 
constraints. This has the consequence that (i) the communication mechanism in- 
teracts smoothly with the existing language features for search and concurrency 
so that all these features can be used to program server applications, and (ii) 
existing programs can be fairly easy integrated into a distributed environment. 
Moreover, the use of logical variables in partially instantiated messages is quite 
useful to avoid complicated communication structures with reply channels. Nev- 
ertheless, external communication ports can be given a symbolic name so that 
they can be passed in messages as in the 7r-calculus [18]. We have demonstrated 
the appropriateness and feasibility of our language extensions by implementing 
several distributed applications. As far as we know, this is the first approach 
which combines functional logic programming based on a lazy (optimal) evalu- 
ation strategy with features for concurrent and distributed programming. 

For future work, we will investigate the application of program analysis tech- 
niques to ensure the safe execution of distributed applications. For instance, 
deadlock exclusion can be approximated by checking groundness of relevant vari- 
ables [8] or the non-conflicting use of free variables transmitted in messages could 
be ensured by proving that they are instantiated by at most one receiver. 
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Logical and Meta-Logical Frameworks 

(Abstract) 
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Logical frameworks have been designed as meta-languages in which deduc- 
tive systems can be specified naturally and concisely. By providing direct support 
for common concepts of logics and programming languages, framework imple- 
mentations such as Isabelle allow the rapid construction of theorem proving 
environments for specific logics. Logical frameworks have found significant ap- 
plications in a variety of areas, including program and protocol verification and 
safe execution of mobile code. 

Recently, researchers have exploited the directness of encodings of deductive 
systems in logical frameworks to reason not only within but about logics. At the 
core of these efforts lies the design and implementation of meta-logical frame- 
works — languages in which properties of logical systems can be expressed and 
proven. 

In this tutorial talk we first provide a brief introduction to the central tech- 
niques of logical frameworks. We then analyze the requirements for meta-logical 
frameworks and sketch and compare three different approaches: inductive def- 
initions [1], definitional reflection [2], and dependent pattern matching and re- 
cursion [3] . The last appears to be most amenable to automation and we discuss 
its design and implementation in the Twelf system in more detail. Recent suc- 
cessful experiments with this implementation include automatic proofs of cut- 
elimination for full first-order intuitionistic logic, the diamond property for par- 
allel reduction in the untyped A-calculus, and the soundness and completeness 
of uniform derivations for hereditary Harrop formulas. 
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Abstract. We study the problem of producing a good compiler for Pro- 
log that generates efficient code for its input programs on the basis of 
the information inferred by a static analysis of those programs. 

The main contribntion of our work is to show that in many cases such 
an optimizing compiler can be obtained by a simple modification of an 
already existing Prolog compiler. 

Our general method is illustrated by describing how the SICStus com- 
piler has been modihed in such a way that it uses information about 
uninitialized variables in order to generate better code than that it would 
generate without that information. We give tables that measure the costs 
and advantages of producing that optimizing SICStus compiler. In order 
to show the generality of our approach, we present also the design of a 
simple modification of SICStus compiler incorporating recursively deref- 
erenced variables. 

Keywords: Abstract Interpretation, Logic Programming, WAM. 



1 Introduction 

Since the 60’s some form of data flow or static analysis has been included in com- 
pilers in order to generate efficient code. However, those analyses were in general 
ad hoc and no general theory existed till the pioneering work of the Cousots [4] . 
The general theory introduced by the Cousots, called abstract interpretation, is 
fundamental both for facilitating the design of static analyses and for proving 
their correctness. Abstract interpretation has been applied to all programming 
paradigms. The language Prolog has been extensively studied in this respect 
because there is a number of interesting run-time properties of that language 
that can be captured by static analyses. Unfortunately, it has been produced a 
very limited number of optimizing Prolog compilers using those static analyses. 
We are aware of only two such compilers: the Aquarius of Van Roy [13] and the 
PARMA of Taylor [12]. These two compilers have been very valuable for showing 
the impact that static analysis could have in generating efficient code, however, 
they are “academic” compilers much less efficient than industrial Prolog compil- 
ers such as, for instance, the Quintus and SICStus compilers. Therefore, these 
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works have had a very limited impact on the people using Prolog for industrial 
applications. 

We have followed a different approach: instead of constructing a new opti- 
mizing compiler from scratch, we have modified an already existing industrial 
compiler in such a way that it can cooperate with a static analyzer in order to 
generate better code. 

The compiler that we have transformed in this way is the SICStus compiler 
and we have made it cooperate with several analyses that infer different run-time 
properties (uninitialized variables and variable safeness). The work presented in 
this paper is actually the final part of a broader project that includes also the 
formal design of those static analyses within the Cousots’ abstract interpreta- 
tion framework and the proof of their correctness. The complete account of this 
project can be found in [2]. 

For showing the generality of our approach, we describe in this paper how 
static analyzers inferring information about uninitialized variables and recur- 
sively dereferenced variables can be incorporated within SICStus. A variable is 
uninitialized if its store location (in the WAM architecture) needs not be initial- 
ized when that variable is first encountered, [13]. We give experimental evaluation 
for proving the effectiveness of our integration of uninitialized variables within 
SICStus. A variable is recursively dereferenced if the store location (in the WAM 
architecture) containing its value can be accessed directly, i.e., without indirect 
addressing [13]. Moreover, if the value of that variable is a compound term then 
also the arguments of that term have to be recursively dereferenced. 

The method illustrated by these examples can be also followed in other cases 
(as we have done in [2]), but there are cases in which it becomes less convenient: 
it is convenient when the optimization supported by the information produced 
by a static analysis, does not require an important change in the abstract ma- 
chine model underlying a compiler. For the SICStus compiler the underlying 
abstract machine is the Warren Abstract Machine (WAM), [1]. For instance, the 
optimization of SICStus supported by an uninitialized variable analysis requires, 
as described later, the introduction of some new operations (specializing the tra- 
ditional WAM operations), with no change to the basic WAM architecture of 
the SICStus compiler. 

An example of information whose use could not be integrated easily into SIC- 
Stus is that concerning uninitialized registers that are used to store output values 
when returning from a procedure call, [13]. The use of these registers interferes 
with last call optimization and environment trimming, [3]. Thus, probably in 
this case the approach, followed by Van Roy, of defining a new abstract machine 
(called the BAM) and constructing a new compiler adopting that architecture, 
is the most appropriate. 

We think that our work is a useful step towards two important goals: 

1. The production of efficient optimizing compilers exploiting sophisticated 
static analyses that are formally defined and proven correct and that are 
also really competitive with the industrial compilers used in Prolog applica- 
tions. 
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2. The definition of a general method (eventually supported by software tools) 
for integrating static analyzers within a (good) compiler. The ability of per- 
forming rapidly such integrations would be very important for testing quickly 
the practical value of new static analyses. 

The rest of the paper is organized as follows. Section 2 contains an example 
illustrating how uninitialized variables are detected and used. Section 3 describes 
briefly the implementation of our analyzer for detecting uninitialized variables 
and gives the details of the integration of that analyzer into SICStus Compiler 
and Emulator. Section 4 presents some statistical evaluations of our version of 
SICStus. Sections 5 and 6 show that the method presented in Section 3 can 
be applied also for incorporating recursively dereferenced variables in SICStus. 
Finally Section 7 closes this paper. 

2 Uninitialized Variable Analysis of nreverse/2 

In a Prolog program a variable is uninitialized [13], when in all computations it 
just receive a non-variable value, i.e., a compound term or a constant. As usual 
in the WAM architecture, those variables are assigned a store cell, but clearly 
those cells need not be initialized because they will immediately later be assigned 
a value. 

In what follows, by means of a simple example, we will show that the knowl- 
edge that a Prolog program contains uninitialized variables supports a significant 
optimization of the code we can generate for that program. 

The example we consider is the program P that defines the (naive) reverse of 
a list. The program P is given below in a convenient form in which every atom 
has distinct variables in its argument positions. 

clO : main VI = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18, 

19,20,21,22,23,24,25,26,27,28,29,30] , 
nreverse (VI , V2) . 

cll : nreverse(Vl,V2) VI = [] , V2 = [] . 

cl2 : nreverse(Vl,V2) VI = [V3|V4], nreverse(V4,V5) , V6 = [V3] , 

append (V5,V6,V2) . 

cl3 : append (VI, V2,V3) VI = [] , V2 = V3. 

cl4 : append (VI, V2,V3) VI = [V4|V5], V3 = [V4|V6], 

append (V5,V2,V6) . 

Through a static analysis of this program one can detect that the second ar- 
gument of nreverse/2 and the third argument of append/3 are uninitialized 
variables. This result is found using the analyzer described in [2], the analyzer 
of [13] and that described in [7]. 

Let us explain intuitively why this result is correct. An analysis starts from 
predicate main/0 and simulates the SLD resolution [8]. Predicate nreverse/2 is 
called, in clO, with its second argument V2 free and unaliased. 

Clearly, cll binds the second argument of nreverse/2 to the empty list. 
For cl2, it is also immediate to see that in the recursive call of nreverse/2 the 
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second argument is again free and unaliased. However, some more thinking is 
needed for deriving that the second argument of the head of cl2 is bound to a 
non variable value by the call of append/3. 

In order to see this fact, the analyzer must discover that append/3 is always 
called with the second argument bound to a non variable value and with the 
third argument free and unaliased. This fact, together with cl3 implies that 
the third argument of append/3 is uninitialized. Thus, the third argument of 
append/3 will be bound to a non variable value when returning from every call 
executed in P. This holds in particular for the call in the body of cl2 and hence 
the second argument of the head of cl2 gets assigned a non variable value. 

From the definition of uninitialized variables given before, it follows directly 
that any static analysis aiming at inferring this information, has to infer also 
information about the variables that are bound to non variable values. In fact, 
all uninitialized variable analyses we know of, i.e., our analysis, [2], that of Van 
Roy [13] and that of Lindgren [7], compute also that information More pre- 
cisely, our abstract domain is the set of 3-tuples {N, F, U) of sets of program 
variables such that: 

— N is the set of variables having a non-variable value; 

— F is the set of free and unaliased variables; 

— U is the set of uninitialized variables; 

~ iVnF = 0; 

— U CN. 

We say that (iVi, Fi, t/i) < (iVz, Fa, C/ 2 ) if N 2 Q Ni, F 2 C Fi, C /2 C Fi, and, for 
having a complete lattice, we add a bottom element T such that V(A^, F, U) :T< 
{N,F, U). The definition of a formal semantics and operations on our abstract 
domain can be found in [2]. 

In the following section we will describe how uninitialized variables support 
the generation of optimized code for the program P presented above. 



2.1 Use of Uninitialized Variables 

It is immediate from the definition of uninitialized variables that, when we know 
that some variable of a program is uninitialized, we can avoid to initialize that 
variable. This amounts to substituting the normal put instruction, that would 
be used to create a memory location for that variable, with a new specialized and 
more efficient put instruction that does not initialize the location assigned to that 
variable. As a matter of fact, the situation is a bit more complicated than this: 
as explained below, also specialized get instructions are needed. However, the 
point we want to stress is that new instructions are a completely straightforward 
specialization of the corresponding original WAM instructions. 

Let us consider again program P given in Section 2. In a normal com- 
pilation of that program, the variable V2 of clause clO would be allocated 

^ The analysis of Van Roy computes also groundness information and the set of re- 
cursively dereferenced terms 
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through a “put_void A 2 ” (where A 2 is the second argument of nreverse/2). 
Using the knowledge that V2 is uninitialized, an optimized compiler would sub- 
stitute “put_void A 2 ” with the more efficient “put_uninit_void A 2 ” whose def- 
inition is shown in the table below. 



put_void A 2 


put_uninit_void A 2 


A 2 = HEAPm = (7?£:U,H++); 


A 2 = (7?_E'_F’,H++) ; 


P += instruction_size (P) ; 


P += instruction_size (P) ; 



Clearly, having uninitialized arguments in a call requires to have also new 
specialized get instructions in the clauses that can be activated by that call. The 
specialized get instructions must simply avoid to dereference those arguments 
that are uninitialized. Notice, again, that new instructions are more efficient 
than the original WAM instructions. 

In order to illustrate the global optimization that can be obtained for program 
P, in what follows we present the usual WAM code produced for it and then point 
out the optimizations that uninitialized variables allow to perform. Dots are used 
to replace the WAM code allocating the list of the first 30 natural numbers. 

main/0 : 

put_list X29 
set_constant 30 
set_constant [] 
put_list X28 
set_constant 29 
set_value X29 

put_list XI 
set_constant 2 
set_value X2 
put_list A1 
set_constant 1 
set_value XI 

put_void A2 (1) 

execute nreverse/2 
nreverse/2 : 

switch_on_term VI, LI, Cl, fail 
Cl: switch_on_constant 1, {([], NDI 
VI: try_me_else Ml 
Nl: get_constant [] , A1 
get_constant [] , A2 
proceed 
Ml : trust_me 
LI : allocate 

get_list A1 
unify_variable Y2 
unify_variable XI 
get_variable Y3, A2 
put _ value XI, A1 



( 2 ) 
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put_variable Yl, A2 (3) 

call nreverse/2, 3 
put_unsaf e_value Yl, Al 
put_list A2 
set_value Y2 
set_constant [] 
put_value Y3, A3 
deallocate 
execute append/3 
append/ 3 : 

switch_on_term V2, L2, C2, fail 
C2 : switch_on_constant 1, {([], N2)}- 
V2 : try_me_else M2 

get_constant [] , Al 

get_value A2, A3 (4) 

proceed 
Ml : trust_me 
LI : get_list Al 

unify_variable X4 
unify_variable XI 

get_list A3 (5) 

unify_value X4 

unify_variable X3 (6) 

put_value XI Al 
put_value X3 A3 
execute append/3 

The optimized code is obtained by replacing: 

~ “put_void A2” with “put_uninit_void A2” at line (1); 

~ “get_constant [] , A2” with “get_uninit_constant [] , A2” at line 

— “put_variable Yl, A2” with “put_uninit_variable Yl, A2” at line 
~ “get_value A2, A3” with “get_uninit_value A2, A3” at line (4); 

~ “get_list A3” with “get_uninit_List A3” at line (5); 

— “unify_variable X3” with “unify_uninit_variable X3” at line (6). 

where new specialized instructions are defined as follows: 
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put_uninit_void A2 


put_uninit_variable Yi, A2 


A2 = (.REFy}{++) ; 

P += instruction_size(P) ; 


A2 = {REF,kYi)] 

P += instruction_size (P) ; 


get_uninit_structure f/n, Ai 


get_uninit_list Ai 


{REF, addrA) = A;; 

HEAPVm = (STTJ.H+I); 
HEAPm+l'] = f/n; 
STOREiaddrA] = (STO.H+l); 
H = H+ 2 ; 
mode = write] 

P += instruction_size(P) ; 


{REF, addrA) = Ai; 

HEAPm = (i/S.H+l); 
STORELaddrA'] = (i/S.H+l) ; 
H++; 

mode = write] 

P += instruction_size (P) ; 


get_uninit_constant c, Ai 


get_uninit_value V„ , Ai 


{REF, addrA) = Ai ; 
STORELaddrAl = {CON,c); 
P += instruction_size(P) ; 


addrV = deref(V„); 

{REF, addrA) = Ai; 
STORELaddrA) = STORELaddrV] ] 
P += instruction_size (P) ; 


unify_uninit_variable X3 




X3 = (,REFy}{++) ; 

P += instruction_size(P) ; 



where “&Yi” is the address of the stack location associated with Yi. The whole 
list of Extended WAM instructions for handling uninitialized variables can be 
found in [2]. 

3 Implementation and Integration of Uninitialized 
Variable Analysis 

We have implemented our uninitialized variable analysis using the Generic Ab- 
stract Interpretation Analyzer GAIA [6]. We modified the parsing phase of the 
original version of GAIA and we implemented our abstract domain and its asso- 
ciated operations [2]. The modification of the parsing phase of input programs 
has been realized because uninitialized variable analysis concerns only with the 
variables occurring in a term and not with its compound subterms. Thus, we sim- 
plified the parsing phase of original GAIA following the normal form of Prolog 
programs illustrated in nreverse/2 (Section 2). 

The output of our analyzer is a sequence of lists, one for each clause of the 
analyzed program, where the list associated to a clause specifies the WAM in- 
structions that can be optimized for that clause. For example, the list associated 
with clO in P is: [main/0/1, [put_void(2)] ] . 

In order to integrate our analyzer into SIGStus, we modified the original 
system in two points: 

— we modified the SIGStus Gompiler in such a way that, using the lists pro- 
duced by our analyzer, it generates extended WAM code of the form shown 
in Section 2.1; 

— we also modified the emulator associated with the SIGStus compiler in such 
a way that it can execute the new instructions. 
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The modification of the SICStus Compiler is very simple. We inserted a 
procedure at the point where the SICStus compiler is about to generate the WAM 
code. That procedure, before generating any (potentially optimizable) WAM 
instruction for some clause, consults if that instruction is on the list produced by 
our analyzer for that clause. In this case, clearly, an optimized WAM instruction 
(cf. [2]) is generated in place of the original one. 

Our strategy for integrating the SICStus compiler with our analyzer has two 
main advantages: its simplicity and its generality. Its simplicity is shown also by 
the fact that the described modification of the SICStus compiler consists in the 
addition of 277 lines of Prolog code to the original SICStus Compiler (precisely 
in file plwam.p4)- 

As far as the generality of our approach is concerned, it is important to stress 
again, cf. the discussion on this point contained in the Introduction, that our 
approach is based on the fact that the optimization supported by a static analysis 
must not cause important changes in the basic WAM model. Observe that this 
is surely true in the case of uninitialized variables where the optimizations are 
in fact local to each clause. 

Clearly, our strategy is rather inefficient because any potentially optimiz- 
able instruction needs to be tested. However, we think that its simplicity and 
generality outweight this shortcoming. 

Since new instructions are generated by the modified SICStus compiler de- 
scribed above, we had to define new bytecodes for those new instructions as 
well as to modify the Emulator in order to handle properly their bytecodes. 
Both these modifications were rather simple to carry out. The file containing 
the bytecodes of WAM instructions (insdef.h) increased by 21 lines of C code. 
The modification of the SICStus Emulator amounts to 302 lines of C code added 
to files wam.c, wam.h, support.h, u2defs.h, and termdefs.h. 



4 Statistics 

In what follows we present tables evaluating our uninitialized variable analyzer 
and the modified SICStus compiler. Section 4.1 evaluates the impact, in terms 
of execution time, of our uninitialized variable analyzer within the modified 
SICStus Compiler. Also, Section 4.1 illustrates and compares the performance of 
our uninitialized variable analyzer with respect to the dataflow analyzer included 
in Aquarius [13]. Section 4.2 presents the results about the quality and the benefit 
of our uninitialized variable analysis. 



4.1 Performance of Uninitialized Variable Analysis 

We consider the execution time of our uninitialized variable analyzer and the 
modified SICStus compiler for a set of benchmarks, taken from [13]. In this way 
we can compare our compiler and analyzer with those of Van Roy. A comparison 
of the execution time of our uninitialized variable analyzer with that of Lindgren 
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was not possible because we didn’t have any information about performance of 
his analyzer. 

Table 1 contains the benchmarks used in our statistics with a brief descrip- 
tion and their size measured in lines of code (not including comments), number 
of predicates, and clauses. The benchmarks in the first block are called small 
benchmarks whereas the other ones are large benchmarks. 



Small 

Benchmarks 


Lines 


Preds 


Clauses 


Description 


dividelO 


27 


3 


10 


Symbolic differentiation 


fast_mu 


54 


7 


17 


An optimized version of the mu-math prover 


logic 


27 


3 


10 


Symbolic differentiation 


mu 


26 


9 


16 


Prove a theorem of Hofstadter’s “mu-math” 


nreverse 


10 


4 


5 


Naive-Reverse of 30-integers list 


ops8 


27 


3 


10 


Symbolic differentiation 


poly_10 


86 


12 


32 


Symbolic rise a polynomial to the tenth power 


qsort 


19 


4 


6 


Quicksort of 50-integers list 


queens_8 


31 


8 


11 


Solution of the 8 Queens Problem 


query 


68 


6 


54 


Query a static database (with integer arithmetic) 


serialise 


29 


8 


13 


Calculate serial numbers of a list 


tak 


15 


2 


3 


Recursive integer arithmetic 


timeslO 


27 


3 


10 


Symbolic differentiation 


zebra 


36 


6 


11 


A logical puzzle based on constraints 


Large 

Benchmarks 


Lines 


Preds 


Clauses 


Description 


boyer 


384 


24 


140 


An extract from a Boyer-Moore theorem prover 


browse 


103 


14 


31 


Build and query a database 


chat.parser 


1130 


155 


519 


Parse a set of English sentences 


flatten 


158 


28 


58 


Source transformation to remove disjunctions 


meta.qsort 


74 


8 


27 


A meta-interpreter running qsort 


nand 


493 


40 


152 


A logic synthesis program based on heuristic search 


prover 


81 


10 


32 


A simple theorem prover 


reducer 


301 


30 


140 


A graph reducer based on combinators 


sdda 


273 


29 


105 


A dataflow analyzer that represents alias 


simple_analyzer 


443 


67 


135 


A dataflow analyzer analyzing qsort 


unify 


125 


28 


55 


A compiler code generator for unification 



Table 1. Benchmarks 



Each benchmark is run ten times and the arithmetic mean of the results is 
taken. Table 2 contains the execution time of our uninitialized variable analyzer 
alone and integrated within the SICStus Prolog Compiler. The fourth column 
indicates the impact of our uninitialized variable analysis on the compilation 
time and is obtained as the third column (multiplied by 100) over the second 
one. These experiments were performed on a SparcStation Classic 40Mb RAM 
powered by a MicroSparc Processor equipped with SunOS 4.1.3. 

The geometric mean value of the ratio of the analysis on the compilation time 
(with analysis) is 34.2% and 37.0% for small and large programs, respectively. 
The fact that small and large programs have similar mean indicates that the 
analysis scales well. 
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Benchmarks 


Compilation Time 
with Analysis (sec.) 


Time of Analysis 
(sec.) 


Ratio 


divide 10 


0.84 


0.29 


34.5% 


fastjnu 


1.22 


0.23 


18.9% 


log 10 


1.02 


0.39 


38.2% 


mu 


1.35 


0.46 


34.1% 


nreverse 


0.52 


0.09 


17.3% 


ops8 


0.98 


0.34 


34.7% 


poly_10 


2.43 


0.75 


30.9% 


qsort 


2.70 


1.45 


53.7% 


queens_8 


1.30 


0.67 


45.7% 


query 


1.86 


0.74 


39.8% 


serialise 


2.10 


1.30 


61.9% 


tak 


3.42 


2.99 


87.4% 


times 10 


1.01 


0.34 


33.7% 


zebra 


0.75 


0.08 


10.7% 


Mean 


1.54 


Geometric Mean 


34.2% 


boyer 


7.75 


2.61 


33.7% 


browse 


2.55 


1.16 


45.5% 


chat_parser 


27.43 


15.6 


56.9% 


flatten 


3.65 


0.92 


25.2% 


meta_qsort 


1.82 


0.47 


25.8% 


nand 


18.07 


6.29 


34.8% 


prover 


2.67 


0.97 


36.3% 


reducer 


9.26 


3.59 


38.8% 


sdda 


8.05 


3.60 


44.7% 


simple_analyzer 


11.45 


4.44 


38.8% 


unify 


6.83 


2.56 


37.5% 


Mean 


9.05 


Geometric Mean 


37.0% 



Table 2. Impact of our Analysis on Compilation Time 
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Table 3 contains the execution time of the uninitialized variable analysis 
performed by the Aquarius Dataflow Analyzer. The compilation and analysis 
times are obtained executing Aquarius 1.0 on the considered benchmarks. These 
experiments were performed on a SparcStation Classic 40Mb RAM powered by 
a MicroSparc Processor equipped with SunOS 4.1.3. 



Benchmarks 


Compilation Time 
with Analysis (sec.) 


Time of Analysis 
(sec.) 


Ratio 


dividelO 


17.5 


1.20 


6.9% 


fastjiiu 


72.3 


3.4 


4.7% 


log 10 


17.4 


1.3 


7.5% 


mu 


22.4 


1.3 


5.8% 


nreverse 


3.7 


1.0 


27.0% 


ops8 


17.3 


1.2 


6.9% 


poly_10 


103.1 


3.3 


3.2% 


qsort 


8.2 


1.2 


14.6% 


queens_8 


15.2 


1.7 


11.2% 


query 


16.8 


1.5 


8.9% 


serialise 


15.3 


1.4 


9.2% 


tak 


6.4 


1.1 


17.2% 


timeslO 


17.4 


1.2 


6.9% 


zebra 


31.7 


1.5 


4.7% 


Mean 


26.1 


Geometric Mean 


8.2% 


boyer 


282.6 


6.9 


2.4% 


browse 


42.9 


4.2 


9.8% 


chat_parser 


1084.0 


54.8 


5.1% 


flatten 


71.9 


6.3 


8.8% 


meta_qsort 


45.8 


2.5 


5.5% 


nand 


699.7 


39.8 


5.7% 


prover 


50.3 


2.8 


5.6% 


reducer 


1496.9 


11.4 


0.8% 


sdda 


314.6 


8.5 


2.7% 


simple_analyzer 


349.6 


16.2 


4.6% 


unify 


130.3 


11.7 


9.0% 


Mean 


415.3 


Geometric Mean 


4.5% 



Table 3. Impact of Van Roy’s Analysis on Compilation Time 



Tables 2 and 3 allow us to establish the following points: 

— both our analysis and Van Roy’s analysis scale well in all considered bench- 
marks; 

— the impact of our analysis on compilation is greater than the one of the 
analysis by Van Roy, due to the fact that SICStus is an efficient compiler; 
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— in absolute terms, our compiler performs 40 times better than Van Roy’s. 

We argue that our compiler is worth of attention in that it increases the whole 
compilation time of only one third obtaining final acceptable compilation times. 



4.2 Quality and Benefit of our Analysis 

In terms of quality, on the considered benchmarks, our analysis computes exactly 
the same information as that of Lindgren and Van Roy. Also, we evaluated 
the benefit of optimized code in terms of execution time of the compiled code. 
Table 4 gives, for each benchmark, both the execution time of optimized and not 
optimized (or simply, WAM) code executed on a PC-i586 36Mb RAM CPU 300 
MHz powered by an AuthenticAMD Processor equipped with Red Hat Linux 
5.1^. Also, the fifth column is the difference (multiplied by 100) between the 
second and the third column, over the second column, i.e.. Saving = 100 x 
(WAM Code - Optimized Code) /WAM Code. 

The executions of each benchmark are grouped into batches. We added to 
every benchmark the following lines of Prolog code, where main(Nexec,Nbatch) 
indicates that Nexec executions of every benchmark are repeated Nbatch times. 
The execution time indicated in Table 4 is the arithmetic mean of the execution 
time of each batch. For uniformity reasons, we used Nbatch = 10 whereas Nexec 
is indicated in the second column of Table 4. 



main (Nexec, Nbatch) batch(Nexec, Nbatch, 0, 0, Tbatch, Ltime) , 

writeC’Mean Among Batches:’), Mean is Tbatch/Nbatch, write (Mean) ,nl . 

bat ch(_, Nbatch, Nbatch, Tbatch, Tbatch, [] ) . 

batch(Nexec, Nbatch, Ndone ,Tpart , Tbatch, [Tcurr I L] ) Nleft > 0, 
statistics(runtime,_) , 
batch(Nexec) , 

statistics (runtime , [_ I Tcurr] ) , 

Tpartl is Tpart + Tcurr, 

Ndonel is Ndone+1, 

write ( ’Execution No.’), write(Ndonel) , 

write(’Time =’), write (Tcurr) ,nl, 

bat ch (Nexec , Nbatch, Ndonel , Tpartl ,Tbatch,L) . 

batch(O) . 

batch(N) N > 0, 
main, 

N1 is N-1, 
batch(Nl) . 

’/. [Prolog code of the benchmark] 

^ We ported SICStus 2.1 to Linux by adding option -G to m4 preprocessor version 
GNU 1.4. 
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where main/0 is the entry procedure of the considered benchmark. 

Table 4 shows that the geometric mean of saved execution time is better for 
small benchmarks rather than for the large ones. A reason for that is that large 
benchmarks heavily use built-ins whose execution time is not affected by our op- 
timization. Also, we cannot compare the execution time of our optimized code 
with the corresponding optimized code of Van Roy because it is not possible, in 
the BAM, to perform only uninitialized variable optimization. The static analy- 
sis of the BAM performs also groundness and recursively dereferenced variable 
analyses together with uninitialized variable analysis. Thus, we cannot give a 
fair comparison with Aquarius. 



Benchmarks 


No. Executions 
(Nexec) 


Execution Time (msec.) 


WAM Code 


Optimized Code 


Saving 


dividelO 


5000 


226.7 


215.6 


4.9% 


fastjnu 


250 


278.4 


264.2 


5.1% 


log 10 


15000 


252.4 


240.3 


4.8& 


mu 


250 


209.4 


198.0 


5.4& 


nreverse 


500 


200.7 


191.1 


4.8% 


ops8 


5000 


137.5 


131.0 


4.7% 


poly_10 


3 


145.6 


138.0 


5.2% 


qsort 


250 


149.6 


142.7 


4.6% 


queens_8 


50 


178.5 


168.9 


5.4% 


query 


50 


226.5 


214.5 


5.3% 


serialise 


300 


144.5 


137.1 


5.1% 


tak 


1 


183.5 


174.4 


5.0% 


zebra 


3 


147.6 


140.5 


4.8% 








Geometric Mean 


5.0% 


boyer 


1 


603.5 


584.8 


3.1% 


browse 


1 


816.6 


789.7 


3.3% 


chat_parser 


1 


166.5 


160.2 


3.8% 


flatten 


100 


152.4 


147.7 


3.1% 


meta_qsort 


50 


319.6 


306.8 


4.0% 


nand 


10 


270.6 


259.2 


4.2% 


prover 


200 


212.7 


202.9 


4.6% 


reducer 


10 


406.4 


386.5 


4.2% 


sdda 


100 


313.4 


303.4 


3.2% 


simple_analyzer 


10 


205.6 


198.8 


3.3% 


unify 


100 


298.6 


286.1 


4.2% 








Geometric Mean 


3.7% 



Table 4. Benefits of our Analysis 
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5 Recursively Dereferenced Variables 

The method presented in Section 3 for integrating a static analyzer detecting 
uninitialized variables within SICStus can be applied also for extending SICS- 
tus with recursively dereferenced variables. The next Section presents the set of 
instructions that extend the SICStus Compiler for handling recursively derefer- 
enced variables. We implemented the static analyzer described in [.5] for detecting 
recursively dereferenced variables and we are currently implementing the exten- 
sion of SICStus using that static analyzer. 

In what follows, by means of the program P given in Section 2, we will show 
how recursively dereferenced variables can be detected using the analyses given 
both in [-5] and [13]. Both analyses start from predicate main/0 and simulate 
the SLD resolution [8]. Predicate nreverse/2 is called, in clO, with both its 
arguments recursively dereferenced. This is true because VI is directly bound to 
a list and V2 is free and unaliased. 

Clearly, cll binds both arguments of nreverse/2 to the empty list and thus, 
they remain recursively dereferenced. For cl2, it is also immediate to see that 
both arguments of the recursive call of nreverse/2 are recursively dereferenced. 
However, some more thinking is needed for deriving that both arguments remain 
recursively dereferenced through the call of append/3. 

In order to see this fact, the analyzer must discover that append/3 is always 
called and exited with recursively dereferenced arguments. This can be achieved 
by adding the information that, at any call of append/3, the first argument 
of append/3 is recursively dereferenced and ground, the second argument of 
append/3 is recursively dereferenced and bound to a non variable value, and 
that the third argument of append/3 is free and unaliased. Let us explain how 
that information is used to conclude that append/3 is always called and exited 
with recursively dereferenced arguments. As regards cl3, V2 = V3 leaves V2 and 
V3 recursively dereferenced because V3 is free and unaliased whereas V2 is bound 
to a recursively dereferenced non variable value. As far as cl4 is concerned, VI = 
[V4|V5], V3 = [V4 1 V6] leave V3 recursively dereferenced because V6 is initially 
free and unaliased and V4 is ground and recursively dereferenced (because VI is 
ground and recursively dereferenced, and V3 is initially free and unaliased). 

From the definition of recursively dereferenced variables given before, it fol- 
lows directly that any static analysis aiming at inferring this information, has 
to infer also information about the variables that are free and unaliased and the 
variables that are bound to ground values and non variable values. In fact, both 
analyses of recursively dereferenced variables we know of, i.e., our analysis, [5] 
and that of Van Roy [13] compute also this information. 

6 Use of Recursively Dereferenced Variables 

The information that a variable is recursively dereferenced can be exploited, for 
example, for optimizing WAM unification. The WAM instructions performing a 
unification would dereference all program variables involved in that unification 
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whereas we can avoid to dereference all variables that are recursively derefer- 
enced. 

The instructions we added to the WAM are a simplification of some existing 
WAM instructions. For example, “get jrderef _value V„, A/’ is derived from 
“get_value V„ , A^” by replacing the call to the WAM generic unification pro- 
cedure “unify” with “rderef _unify” described in Figure 1, where PDF is a 
stack of addresses, pop and push are the usual stack operations, and “bind” is 
given in [2]. 



get.value , Ai 


get_rderef .value Vn , Ai 


unifyCVn ,Ai) ; 
if (fail) backtrack; 
else 

P += instruction_size(P) ; 


rderef .unify (Vn ,Ai) ; 
if (fail) backtrack; 
else 

P += instruction.size(P) ; 


unifyCai, 02 : address){ 
push{ai , PDL); 
push{a2 , PDL); 
fail = FALSE-, 
do{ 

d\ = deref (pop(PDL) ) ; 
d2 = deref (pop(PDL) ) ; 
ifCdi!=d 2 ){ 

= STOREld-i^ ; 

(t 2 ,v 2 ) = STOREld^'] ; 
if(t\==REF) bindCdi ,d^) ; 
else switch t2 

case REF : bind(di,d2); 

break; 
case CON : 

fail = (t\\ = CON) II (vi\=V2)i 

break; 
case LIS : 

if(ti\=LIS) fail = TRUE; 
else 

{push(vi ,PDL) ; 
push(v2 ,PDL) ; } 
break; 
case STR : 

if(ti\=STR) fail = TRUE; 
else {fi/ni = STORElvO; 

= STOREIv 2 '\ ; 
if(f 1 !=f 2 ) 1 1 (n\ !=n 2 ) 
fail = TRUE; 
else 

for(z = 1 ; 2 <= ni ; 2++){ 
push(vi + i, PDL); 
push(v2 + i, PDL);} 

} 

break; 

} 

while !(empty(PDL) II fail) } 


rderf .unify (ai, 02 : address){ 

push{ai , PDL) ; 

push{a 2 , PDL) ; 

fail = FALSE ; 

do{ 

d\ = pop(PDL) ; 
d 2 = pop(PDL) ; 
if(di!=d 2 ){ 

(ti,vi) = STOREldO ; 

(t 2 ,v 2 ) = STOREld 2 '\ ; 
if(ti==REF) bindCdi ,^2) 5 
else switch t2 

case REF : bind(di,d2); 

break; 
case CON : 

fail = (ti\ = CON) II (vi\=V2); 

break; 
case LIS : 

if(ti\=LIS) fail = TRUE; 

else 

{push(v\ ,PDL) ; 
push(v2 ,PDL) ; } 
break; 
case STR : 

ifCti!= 5 T_R) fail = TRUE; 
else {fi/ni = STORE^vO; 
fit'll = STOREIV2I ; 
if(f 1 !=f 2 ) 1 1 (ni \=ri 2 ) 
fail = TRUE; 
else 

for(z = 1 ; 2 <= ni ; 2++){ 
push(v\ + 2, PDL); 
push(v2 + i, PDL);} 

} 

break; 

} 

while ! (empty (PDL) 1 | fail) } 



Fig. 1. Unification 



The complete list of optimized instructions can be found in [2]. We remark 
that our extension of the WAM is fairly simple. The corresponding extension of 
SICStus Compiler and Emulator can be carried out using an analyzer that pro- 
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duces, for each clause of an input Prolog program, the list of WAM instructions 
that can be optimized. Following the guidelines given in Section 3, we can then 
insert a procedure at the point where the SICStus compiler is about to generate 
the WAM code. This procedure, before generating any (potentially optimizable) 
WAM instruction for some clause, consults if this instruction is on the list pro- 
duced by our analyzer for that clause. In this case, clearly, an optimized WAM 
instruction (cf. [2]) is generated in place of the original one. This shows that our 
method for incorporating uninitialized variables within SICStus can be applied 
also for introducing recursively dereferenced variables. 

7 Conclusion 

We presented a general and simple method for integrating static analyses within 
Prolog compilers based on the WAM. Our method has been illustrated show- 
ing the integration of uninitialized variable analysis in the SICStus Compiler 
and Emulator. The experimental evaluation shows that our approach is rather 
promising: the modified SICStus compiler obtained is still reasonably efficient 
and the execution time of the optimized code is, on the average, 4,4% less than 
that of non optimized code. In order to show that this approach can be applied to 
other static analyses, we outlined also the integration of recursively dereferenced 
variable analysis in the SICStus Compiler and Emulator. 

Currently, we are finishing the integration of the recursively dereferenced 
variables analysis within SICStus^. For the future, we plan to integrate into 
SICStus other analyses such as the indexing analysis and the pointer chain anal- 
ysis, i.e., the analysis estimating the length of the pointer chains that have to 
be dereferenced in order to reach the value of a variable [11]. We will also inves- 
tigate the application of our method to the logic language Mercury [10]. In this 
context it would be interesting to consider analyses such as variable liveness [9]. 
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Abstract. The logic/functional language Mercury uses a strong, mostly 
static type system based on polymorphic many-sorted logic. For effi- 
ciency, the Mercury compiler uses type specific representations of terms, 
and implements polymorphic operations such as unifications via generic 
code invoked with descriptions of the actual types of the operands. These 
descriptions, which consist of automatically generated data and code, are 
the main components of the Mercury runtime type information (RTTI) 
system. We have used this system to implement several extensions of 
the Mercury system, including an escape mechanism from static type 
checking, generic input and output facilities, a debugger, and automatic 
memoization, and we are in the process of using it for an accurate, native 
garbage collector. We give detailed information on the implementation 
and uses of the Mercury RTTI system as well as measurements of the 
space costs of the system. 



1 Introduction 

Many modern functional and logic programming languages have a strong static 
type system and support parametric polymorphism. For efficiency, since the 
types of almost all values are known at compile-time, it is desirable to specialize 
the representation of data for each type, rather than using a single representation 
for data of any type (as is typically done with dynamically typed languages). 
When the type is known statically, the compiler is able to emit the proper type- 
specific code to manipulate those values. In some cases, however, the compiler 
does not know the exact type of a value. For example, in a polymorphic predicate 
or function, the compiler may know that type of a variable is list(T), but may 
not know what type the type variable T is bound to, since that can vary from call 
to call. Nevertheless, for some operations it may still be necessary to examine 
the representation. 

In such circumstances, implementors have two main choices. One alternative 
is to create separate copies of the implementation for each possible type T can be 
bound to, thus restoring the compiler’s full knowledge of the types of variables. 
This is the approach taken for the implementation of generics in most imperative 
languages. Its advantage is execution speed, due to the exclusive use of type- 
specific operations; the corresponding disadvantage is the cost in code space and 
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locality of the multiple copies, many of which will typically be used quite rarely. 
Another significant disadvantage with this approach is that it makes separate 
compilation much more difficult. 

The other alternative is to have only one implementation, but make this 
one able to handle all the calls. This obviously requires callers to make available 
runtime type information (RTTI) about the actual type bound to T that can then 
be interpreted by a single implementation. The advantages of this alternative are 
its small space cost, and ease of separate compilation, while its disadvantage is 
the time costs of lookup and interpretation. 

In this paper we describe the RTTI system of Mercury, a purely declara- 
tive logic/functional language. Since most Mercury programs use polymorphism 
much more frequently than imperative language programs use generics, we be- 
lieve that the space cost of the first approach would be prohibitive. However, 
we also want the implementation to be fast, so we have settled on a hybrid of 
the approaches. This hybrid uses RTTI to allow us to get away with only one 
implementation of each polymorphic predicate, but the most frequently used 
operations (unification and comparison) do not require interpretation. Other, 
less frequently used operations do, since for them this is the proper space-time 
tradeoff. 

Since the system has RTTI, we make it available to users who may wish to 
perform type specific operations (e.g. pretty-printing) on terms of polymorphic 
types, as well as to system programmers working to implement new language 
features. In the last two years, we have extended the Mercury implementation 
with several features that require access to RTTI, some of which required us to 
extend the RTTI system. Automatic memoization requires detailed knowledge 
of the data representations of types to construct efficient indexes. The debugger 
needs similar knowledge in order to be able to print out the values of variables 
on demand, and our (as yet incomplete) native garbage collector needs it to be 
able to trace through and to copy terms. (The Mercury runtime system currently 
relies upon the Boehm conservative garbage collector for C [5].) 

The rest of this paper is organized as follows. Section 2 introduces the relevant 
aspects of the Mercury language and describes how the Mercury implementation 
represents terms. Section 3 describes, at a significantly deeper level of detail than 
most other papers on RTTI, the data structures we use to store RTTI and how 
the information in these data structures is made available both to relevant parts 
of the implementation and to programmers. Section 4 evaluates the space impact 
of our RTTI implementation. Section 5 presents comparisons with related work. 



2 Background 

2.1 Mercury 

Mercury is a pure logic/functional programming language intended for general 
purpose large-scale programming. We will describe in detail the data represen- 
tation used by the Mercury implementation, but for an overview of Mercury we 
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refer the reader to the Mercury language reference manual [10] which is available 
from the Mercury home page http : //www . cs . mu . oz . au/mercury/. 

2.2 Data Representation 

The Mercury implementation uses a different, specialized representation for the 
terms of each type. This is possible because the Mercury compiler knows the 
types of almost all terms at compile time, and the few exceptions do not present 
insoluble problems; we will discuss the solutions of some of these problems later. 
The advantage of specializing term representations is that it reduces storage 
requirements somewhat and improves time efficiency considerably. The disad- 
vantage is that you cannot tell the value represented by a bit pattern without 
knowing what type it is. 

type dir > north ; south ; east ; west. 

type example > a ; b(int, dir) ; c(example). 

Types such as dir, in which every alternative is a constant, correspond to 
enumerated types in other languages. Mercury implements them as if they were 
enumerated types, representing the alternatives by consecutive small integers 
starting with zero. These integers are stored directly in machine words, as are 
values of builtin types that fit in a word, such as integers. Values of builtin types 
that do not fit in a word, e.g. strings and (on some machines) double precision 
floating point numbers, are stored in memory and represented by a pointer to 
that memory, to allow us to establish the invariant that all values fit into a word. 
Polymorphic code depends on this invariant; without it, the compiler could not 
generate a single piece of code that can pass around values of a type that is 
unknown at compile time. 

Types such as example, in which some alternatives have arguments, obviously 
need a different representation. One possible representation would be as a pointer 
to a memory block, in which the first word specifies the function symbol, and 
the later words contain its arguments, with the block being just big enough to 
contain these arguments. The size of the memory block may therefore depend 
on the identity of the function symbol. 

The Mercury implementation uses a more sophisticated and efficient variant 
of this representation. This implementation exploits the fact that virtually all 
modern machines, and all those we are interested in, address memory in bytes 
but access it in words. Many of these machines require words to be aligned on 
natural boundaries, and even the ones that don’t usually suffer a performance 
penalty when accessing unaligned words. The Mercury implementation therefore 
stores all data in aligned words on all machines. This means that the address of 
any Mercury data item will have zeros in its low-order 2 bits on 32-bit machines 
or low-order 3 bits on 64-bit machines. We can therefore use these bits, which we 
call primary tags, to distinguish between function symbols. (Mercury works on 
both 32-bit and 64-bit machines, but for simplicity of exposition, we will assume 
32-bit machines for the rest of this paper.) 
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In the case of type example, we assign the primary tag value 0 to a, the 
primary tag value 1 to b, and the primary tag value 2 to c. Using primary 
tags in this way allows us to reduce the size of the memory blocks we use, 
since these no longer have to identify the function symbol, It also allows us to 
avoid using a memory block at all for constant function symbols such as a, whose 
representation therefore does not need a pointer at all, and in which we therefore 
set the non-primary-tag bits of the word to zero. 

Of course, some types have more function symbols than a primary tag can 
distinguish. In such cases, some function symbols have to share the same pri- 
mary tag value. If all functors sharing the same primary tag value are constants, 
we distinguish them via the non-primary-tag bits of the word; we call this a 
local secondary tag. If at least one of them is not a constant, we distinguish 
them by storing an extra word at the start of the argument block; we call this 
a remote secondary tag. Both local and remote secondary tags are integers allo- 
cated consecutively from zero among the function symbols sharing the relevant 
primary tag. The compiler has a fairly simple algorithm that decides, for each 
function symbol, what primary tag value its representation has, and if that pri- 
mary tag value is shared, whether the secondary tag is local or remote and what 
its value is. To save both space and time, this algorithm will share a primary 
tag value only between several constants or several non-constants, and will not 
share between a constant and a non-constant. 

Following pointers that have primary tags on their low order bits does not 
actually cost us anything in execution time. To make sense of the word retrieved 
through the pointer, the code must have already tested the primary tag on 
the pointer. (It does not make sense to look up an argument in a memory block 
without knowing what function symbol it is an argument of, and it does not make 
sense to look at a remote secondary tag without knowing what function symbols 
it selects among.) When following the tagged pointer, the implementation must 
subtract the (known) value of the primary tag and add the (known) offset in 
the pointed-to memory block of the argument or remote secondary tag being 
accessed. (Actually, remote secondary tags always have an offset of zero.) The 
two operations can be trivially combined into one, which means adding a possibly 
negative, but small constant to the pointer. On most machines, this is the most 
basic memory addressing mode; one cannot access memory faster any other way. 



3 Run-Time Type Information 

One important design principle of the Mercury implementation, which we fol- 
lowed during our design of the RTTI system, is the avoidance of “distributed 
fat” , which are implementation artifacts required by one language feature that 
impose efficiency costs even when that feature is not used. In other words, we 
don’t want the RTTI system to slow down any parts of the program that do 
not use RTTI. Of course, we also want the RTTI system to have good efficiency 
for the parts of the program that do use RTTI. The aspect of efficiency that 




228 Tyson Dowd et al. 



most concerns us is time efficiency; we are usually willing to trade modest and 
bounded amounts of space for speed. 

3.1 Describing Type Constructors 

The Mercury data representation scheme is compositional, i.e. the representation 
of a list does not depend on the type of the elements of the list. Therefore the run- 
time representation of a composite type such as tree (string, list (int) ) can 
be described by writing down the representation rules of the type constructors 
occurring in the type and showing how these type constructors fit together. Since 
a given type constructor will usually occur in many types, storing the informa- 
tion about the type constructor just once and referring to it from the descriptions 
of many types is obviously sensible. We call the data structure that holds all the 
runtime type information about a type constructor a type_ctor_inf o. When 
compiling a module, the Mercury compiler automatically generates a static data 
structure containing a type_ctor_inf o for every type declaration in the 
module. This data structure has a unique but predictable name derived from 
the name of the type constructor, which makes it simple to include references to 
it in other modules. 

The type_ctor_inf o is a pointer to a vector of words containing the following 
fields: 

- the arity of the type constructor, 

- the address of the constructor-specific unification procedure, 

- the address of the constructor-specific index procedure, 

- the address of the constructor-specific comparison procedure, 

- a pointer to the constructor’s type_ctor J.ayout structure, 

- a pointer to the constructor’s type_ctor jtunctors structure, and 

- the module qualified name of the type constructor. 

Like the type_ctor_inf o, the constructor-specific unification, index and com- 
parison procedures and type_ctor_layout and type_ctor Junctors structures 
are also automatically generated by the compiler for each type declaration. We 
will provide details on these fields later. 

3.2 Describing Types 

A type is a type constructor applied to zero or more arguments, which are 
themselves types. Due to the compositionality of data representation in Mercury, 
the data structure that holds all the runtime type information about a type, 
which we call a type Jnfo, is a pointer to a vector of words containing 

- a type_ctor Jnf o pointer, and 

- zero or more type Jnfo pointers describing the argument types. 

The number of other type Jnfo pointers is given by the arity of the type 
constructor, which can be looked up in the type_ctor Jnf o. If the arity is zero, 
this representation is somewhat wasteful, since it requires an extra cell and 
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Fig. 1. The type_info structure for tree(string, list(int)) 



imposes an extra level of indirection. The Mercury system therefore has an op- 
timization which allows the type_ctor_inf o for a zero-arity type constructor to 
also function as a type_info for the type named by the type constructor. Code 
that inspects a type_info now needs to check whether the type_info has the 
structure just above or whether it is the type_ctor_inf o for a zero-arity type 
constructor. Fortunately, the check is simple; whereas the first word of a real 
type_info structure is a pointer and can never be null, the first word of the 
type_ctor_inf o structure contains the arity of the constructor, and therefore 
for a zero-arity constructor will always be null. This can be a worthwhile opti- 
mization, because zero-arity types occur often; the leaves of every type tree are 
type constructors of arity zero.^ Figure 1 shows the type_info structure of the 
type tree (string, list(int)) with this optimization. 



3.3 Implementing Polymorphism 

In the presence of polymorphism, the compiler cannot always know the actual 
type of the terms bound to a given variable in a given predicate. If an argument 
of a polymorphic predicate is e.g. of type T, then for some calls the argument 
will be a term of type int, for others a term of type list (string) , and so 
on. The question then is: how can the compiler arrange the correct functioning 
of operations (such as unification) that depend on the actual type of the term 
bound to the variable? 

The answer is that the compiler can make available to those operations the 
type_info for the actual type. An early phase of the compiler inspects every 
predicate, and for each type variable such as T in the type declaration of the 
predicate, it adds an extra argument to the predicate; this argument will contain 
a type_info for the actual type bound to T. The same phase also transforms the 
bodies of predicates so that calls to polymorphic predicates set these arguments 
to the right values. 

^ On some modern architectures, mispredicted branches can be more expensive than 
memory lookups, which often hit in the cache. For these architectures, the Mercury 
compiler has a switch that turns off this optimization. 
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As an example, consider a predicate pi that passes an argument of type 
treeCstring, list(int)) to a predicate p2 that expects an argument of type 
tree(Tl, T2). Since T1 and T2 are type variables in the signature of p2, the 
compiler will add two extra arguments to p2, one each for T1 and T2. As the value 
of the first of these extra arguments, pi must pass the type_info for string; as 
the value of the second of these extra arguments, pi must pass the type_info 
for list(int). If pi does not already have pointers to the required type_info 
structures, it must construct them. This means that while there can be only one 
type_ctor_inf o structure for each type constructor, there may be more than 
one type_info structure and therefore more than one type_info pointer for 
each type. 

If p2 wants to pass a value of type list(Tl) to a predicate p3 that expects 
a value of type U, p2 can construct the type_info structure expected by p3 
even though the type bound to T1 is not known at compile time. To create this 
type_info structure, the compiler simply emits code that creates a two- word 
cell on the heap, and copies the pointer to the globally known type_ctor_inf o 
for list /I to the first word and the pointer it has to the type_info for T1 to 
the second word. 

3.4 Implementing Unification and Comparison 

One operation that polymorphic predicates frequently perform on their polymor- 
phic arguments is unification (consider member /2). To unify two values whose 
type it does not know, the compiler calls unify/2, the generic unification pro- 
cedure in the Mercury runtime system. Since unify/2 is declared to take two 
arguments of type T, the polymorphism transformation will transform calls to it 
by adding an extra argument containing the type_info describing the common 
type of the two original arguments. 

The implementation of unify/2 consists of looking up the address of the 
unification procedure in the top-level type_ctor_inf o of the type_info, and 
calling it with the right arguments. For builtin type constructors, the unification 
procedures are in the runtime system; for user-defined type constructors, they 
are automatically generated by the compiler. The technique the compiler uses 
for this is quite simple; it generates one clause for each alternative functor in 
the type constructor’s type declaration, and in each clause, it generates one 
unification for each argument of that functor. Here is one example of a type and 
its automatically generated unification predicate: 

type tree(K, V) > leaf ; node(tree(K, V), K, V, tree(K, V)). 

unify_tree(leaf , leaf). 

unify_tree(node(Ll , Kl, VI, Rl) , node(L2, K2, V2, R2)) 
unify (LI, L2) , 
unify(Kl, K2) , 
unify (VI, V2) , 
unify (Rl, R2) . 
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After creating the unification predicate, the compiler optimizes it by recog- 
nizing that for the first and last calls to unify, the top-level constructor of the 
type is known, and that those calls can thus be replaced by calls to unify_tree 
itself. Later still, the optimized predicate will go through the polymorphism 
transformation, which yields the following code: 

unify_tree(TI_K, TI_V, leaf, leaf). 
unify_tree(Tl_K, T1_V, node(Ll, Kl, VI, Rl) , 
node(L2, K2, V2, R2)) 
unify_tree(Tl_K, T1_V, LI, L2) , 
unify(Tl_K, Kl, K2) , 
unify (T1_V, VI, V2) , 
unify_tree(Tl_K, T1_V, Rl, R2) . 

This shows that when the generic unification predicate unify is called upon 
to unify two trees, e.g. two terms of type tree (string, list(int)), two of 
the arguments it must call unify_tree with are the type_infos of the types 
string and list(int). It can do so easily, since the required type_infos are 
exactly the ones following the type_ctor_inf o of tree/2 in the type_inf o struc- 
ture of tree(string, list(int)), a pointer to which was passed to unify as 
its extra argument, (unify of course got the address of unify_tree from the 
type_ctor_inf o of tree/2.) 

Automatically generated comparison predicates call automatically generated 
index predicates which return the position of the top-level functor of a term in 
the list of alternative functors of the type. This allows for comparisons to be made 
for less than, equal to or greater than without comparing each functor to every 
other functor. After the initial comparison the comparison code has a similar 
recursive structure to the code generated for unification, and the polymorphism 
transformation is analogous. 

3.5 Interpreting Type- Specialized Term Representations 

Some polymorphic predicates wish to perform operations on polymorphic val- 
ues for which there is no compiler-generated type-representation-specific code 
the way there is for unifications and comparisons. Copying terms and printing 
terms are examples of such operations. In such cases, the implementation of the 
operation must itself decode the meaning of a term in a type-specific data rep- 
resentation. Since it is the compiler that decides how values of each type are 
represented, this requires cooperation from the compiler. This cooperation takes 
the form of a compiler-generated type_ctor_layout structure for each type con- 
structor, pointed to from the type_ctor_inf o structure of the constructor. Like 
the type_ctor_inf o, the type_ctor_Layout structure is static, and there is only 
ever one type_ctor_layout for a given type constructor. 

Since most values in Mercury programs belong to types which are discrimi- 
nated unions, we chose to optimize type_ctor_layout structures so that given 
a word containing a value belonging to such a type, it is as efficient as possible 
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to find out what the term represented by that word is. The type_ctor_layout 
is therefore a vector of descriptors indexed by the primary tag value of the data 
word, which thus contain four descriptors on 32-bit machines and eight on 64-bit 
machines. Each word says how to interpret data words with the corresponding 
primary tag. For types which are not discriminated unions (such as int), and 
thus do not use primary tags, all the descriptors in the vector will be identical; 
since there are few such types, and the vectors are small, this is not a problem. 

To make the type_ctor_layout as small as possible, each descriptor is a sin- 
gle tagged word; here we use a 2-bit descriptor tag regardless of machine archi- 
tecture. The value of this descriptor tag, which can be unshared, shared_remote, 
sharedJocal or equivalence, tells us how to interpret the rest of the word. 

If the value of the descriptor tag is unshared, then this value has a discrim- 
inated union type and the primary tag of the data word uniquely identifies the 
functor. The rest of the descriptor word is then a pointer to a functor descriptor 
which contains 

- the arity of the functor (n), 

~ n pseudo_type_inf os for the functor arguments, 

- a pointer to a string containing the functor name, and 

- information on the primary tag of this functor, and its secondary tag, if any. 

The last field is redundant when the functor descriptor is accessed via the 

type_ctor J.ayout structure; it is used only when it is accessed via the 
type_ctor jfunctors structure which is discussed below in section 3.6. 

Many type declarations contain functors whose arguments are of polymor- 
phic type; for example, all the arguments of the functor node in our exam- 
ple above contain a type variable in their type. For such an argument, the 
type_ctor_Layout structure, being static, cannot possibly contain the actual 
type_info of the argument. Instead, it contains a pseudo_type_inf o, which is 
a generalization of a typeinfo. Whereas a type_info is always a pointer to a 
type_inf o structure, a pseudo_type_inf o is either a small integer that refers to 
a type variable, or a pointer to a pseudo_type_inf o structure, which is exactly 
like a type_info structure except that the fields after the type_ctor_inf o are 
pseudo_type_inf os rather than type_infos. 

The functor descriptor for the functor node will contain the small integers 
1 and 2 as its second and third pseudo_type_inf os, standing for the type vari- 
ables K and V respectively, which are first and second type variables in the 
polymorphic type tree(K, V). The first and fourth pseudo_type_inf os will 
be pointers to pseudo_type_inf o structures in which the type_ctor_inf o slot 
points to the the type_ctor_inf o structure for tree/2 and the following two 
pseudo_type_inf os are the small integers 1 and 2. When a piece of code that has 
a type_info for the type tree (string, list (int)) looks up the arguments of 
the node functor, it will construct type_infos for the arguments by substituting 
any pseudo_type_inf os in the arguments (or in arguments of the arguments, 
and so on), with their corresponding parameters, i.e. the type_inf os for string 
and for list (int), which are at offsets 1 and 2 in the type_info structure for 
tree(string, list(int)). Note the exact correspondence between the offsets 
and the values of the pseudo_type_inf os representing the type variables. 
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We can distinguish between small integers and pointers by imposing an ar- 
bitrary boundary between them. If the integer value of a word is smaller than a 
given limit, currently 1024, then the word contains a small integer; if it is greater 
than or equal to the limit, it is a pointer. This works because we can ensure that 
all small integers are below this limit, in this case by imposing an upper bound 
on the arities of type constructors, and because we can ensure that all pointers 
to data are bigger than the limit. (The text segment comes before the data seg- 
ment, and the size of the text segment of the compulsory part of the Mercury 
runtime system is above the limit; in any case, most operating systems make the 
first page of the address space inaccessible in order to catch null pointer errors.) 

If the value of the descriptor tag is shared_remote, then this value has a 
discriminated union type and the primary tag of the data word is shared between 
several functors, which are distinguished by a remote secondary tag. The rest of 
the descriptor word is then a pointer to a vector of words which contains 

- the number of functors that share this tag (/), and 

- / pointers to functor descriptors. 

To find the information for the functor in the data word, we must use the 
secondary tag pointed to by the data word to index into the vector of functor 
descriptors. 

If the value of the descriptor tag is sharedJocal, then there are three pos- 
sibilities: (a) this value has a discriminated union type and the primary tag of 
the data word is shared between several functors, which must all be constants 
because which are distinguished by a local secondary tag; (b) this value has an 
enumerated type, such as type example from 2.2; or (c) this value has a builtin 
type such as int or string. For alternative (c), the rest of the descriptor word 
is a small integer that directly identifies the builtin type. For alternatives (a) 
and (b), the rest of the descriptor word is a pointer to an enumeration vector, 
which contains 

- a boolean that says whether this is an enumeration type or not, and thus 
selects between (a) and (b), 

- s, the number of constants that share this tag (for (a)) or the number of 
constants in the entire enumeration type (for (b)), and 

- s pointers to strings containing the names of the constants. 

To find the name of the functor in the data word, we must use the local 
secondary tag in the data word (for alternative (a)) or the entire data word (for 
alternative (b)) to index into the vector of names. 

If the value of the descriptor tag is equivalence, then the value is either of 
a type that was declared as an equivalence type by the programmer, or it is of 
a no-tag type, a discriminated union type with one functor of one argument, 
which the compiler considers to be an equivalence type for purposes of internal 
although not external representation. Here is one example of each. 

type equiv(Tl, T2) == fooCint, T2, Tl) . 

type notag(Tl, T2) > wrapper (foo (int, T2, Tl)). 
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In the latter case, the compiler uses the same internal representation for 
values of the types notag(Tl, T2) and fooCint, T2, Tl), just as it does for 
the true equivalence. 

The rest of the descriptor tag is a pointer to an equivalence vector, which 
contains 

- a flag saying whether this type is a no_tag type or a user-defined equivalence 

type, 

- a pseudo_type_inf o giving the equivalent type, and 

- for no_tag types, a pointer to a string giving the name of the wrapper functor 
involved. 



3.6 Creating Type- Specialized Term Representations 

A type_ctor_layout structure has complete information about how types with 
a given type constructor are represented. While the organization of this struc- 
ture is excellent for operations that want to interpret the representation of an 
already existing term, the organization is not at all suitable for operations that 
want to build new terms, such as parsing a term from an input stream. The 
type_ctor jfunctors table is an alternate organization of the same informa- 
tion that is designed to optimize the operation of searching for the informa- 
tion about a given functor. Like the type_ctor_inf o that points to it, the 
type_ctor jfunctors structure is static, and there is only ever one 
type_ctor jfunctors structure for a given type constructor. 

The first word of the type_ctor jfunctors structure is an indicator saying 
whether this type is a discriminated union, an enumeration type, a no_tag type, 
an equivalence, or a builtin. The contents of the rest of the structure vary de- 
pending on the indicator. For discriminated unions, the structure contains the 
number of functors in the type, and a vector of pointers to the functor descrip- 
tor for each functor. For enumerations, it contains a pointer to the enumeration 
vector. For no_tag types, it has a pointer to the functor descriptor for its single 
functor. For true equivalence types, it contains the pseudo_type_inf o for the 
equivalent type. For builtin types, it contains the small integer that identifies 
the builtin type. 

3.7 Accessing RTTI from User Level Code 

A natural application of RTTI is dynamic typing [I]. The Mercury standard li- 
brary provides an abstract data type called univ which encapsulates a value 
of any type, together with its type_info. The library provides a predicate 
type_to_univ for converting a value of any type to type univ. 

pred type_to_univ(T, univ). 

mode type_to_univ(in, out) is det . 

mode type_to_univ(out , in) is semidet. 
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Note that type_to_univ has two modes. The second (reverse) mode lets you 
try to convert a value of type univ to any type; this conversion will fail if 
the value stored in the univ does not have the right type. The reverse mode 
implementation compares the type_info for T, which the compiler passes as an 
extra argument, with the type_info stored in the univ. 

In addition to this implicit use of RTTI, Mercury allows user programs to 
make explicit use of RTTI, by providing some RTTI types and operations on 
those types as part of the Mercury standard library. 

We provide abstract data types to represent runtime type information such 
as type_infos and type_ctor_inf os. The operations on them include: 

func type_of(T) = type_info. 

func type_name (type_inf o) = string. 

pred type_ctor_and_args (type_inf o : : in, type_ctor_inf o : : out , 
list (type_info) :: out) is det . 

pred functor (T: : in, string: :out, int::out) is det. 

:- func argument (T: : in, int : : in) = (univ::out) is semidet . 

The type_of function returns a type_info describing its argument. Its im- 
plementation is trivial: the compiler will pass the type_info for the type T as 
an extra argument to this function, and type_of can just return this extra ar- 
gument. 

Once you have a type_info, you can find out the name of the type it rep- 
resents; this is useful e.g. in giving good error messages in code manipulating 
values of polymorphic types. You can also special-case operations on some types, 
for purposes such as pretty-printing. You can also use type_ctor_and_args to 
decompose type_inf os into their constituent parts. This is mostly useful in con- 
junction with operations that decompose terms, such as functor and arg. Still 
other operations are designed to allow programs to construct types (that is, 
type_infos at runtime) by combining existing type constructors in new ways, 
and to construct terms of possibly dynamically created types. 

3.8 Representing Type Information about Sets of Live Variables 

When a program calls io: print to pretty-print a term or io:read to read one 
in, the polymorphism transformation passes the required type_info(s) to the 
predicate involved. This is possible because the predicate deals with a fixed 
number of polymorphic arguments and because the number of type variables in 
the types of those arguments is also known statically. 

However, in some cases we want one piece of code to be able to deal with 
arbitrary numbers of terms, which have an unknown number of type variables 
in their types. Two examples are the Mercury debugger and the Mercury native 
garbage collector. They both need to be able to interpret the representations of 
all live variables at particular points in the program, in the case of the debugger 
so that it can print out the values of those variables if the user so requests, and in 
the case of the garbage collector so that it can copy the values of those variables 
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from from-space to to-space. To handle this, at each program point that may be 
of interest to the debugger or to the native collector, the compiler generates a 
data structure describing the set of live variables, and lets the debugger and the 
native collector interpret this description. Of course, if the compilation options 
do not request debugging information and if they request the conservative, rather 
than the native collector, there will be no such programs points, and the data 
structures we discuss in this subsection will not be generated. 

The debugger and the native collector both need to know how to walk the 
stacks (for printing the values of variables in ancestors for the debugger and 
because all live values in all stack frames are part of the root set for the native 
collector). For the nondet stack this is not a problem, since nondet stack frames 
store the pointer to the previous stack frame and the saved return address in 
fixed slots. However, frames on the det stack have no fixed slots, and they are 
of variable size. To be able to perform a step in a stack walk starting from a det 
stack frame, one must know how big the frame is and where within it the saved 
return address is. The compiler therefore generates a proc layout structure for 
each procedure, which includes 

- the address of the entry to this procedure 

- the determinism of this procedure (this controls which stack it uses) 

- the size of the stack frame 

- the location of the return address in the stack frame 

The stack frame size and saved return address location are redundant for 
procedures on the nondet stack, but it is simpler to include this information for 
all procedures. 

The debugger and the native collector both have their own methods for 
getting hold of the proc layout structure for the active procedure, and can thus 
find out what address the active procedure will return to. However, without 
knowing what procedure this return address is in, they won’t be able to take the 
next step in the stack walk. Therefore when debugging or the native collector 
is enabled, the compiler will generate a label layout table for every label that 
represents the return address of a call. Label layout tables contain: 

- a pointer to the proc layout structure for this procedure, 

- n, the number of live and “interesting” variables at the label, 

- a pointer to two consecutive n-element vectors, one containing 
pseudo_type_inf os for the types of the live variables, and one containing 
the descriptors of the locations of live variables, 

- a pointer to a vector of type parameter locations, the first element of which 
gives the number of type parameters and hence the size of the rest of the 
vector; as an optimization, the pointer will be null if the count is zero, and 

- a pointer to a vector of n offsets into a module-wide string table giving the 
variables’ names (this field is present only when debugging). 

The Mercury runtime has a table which can take the address of a label (such 
as a return address) and return a pointer to the label layout structure for that 
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label. That table, the proc layout structures and the first fields of label layout 
structures together contain all the information required for walking the stacks. 

The other fields in a label layout structure describe the “interesting” variables 
that are live at that label. Here the debugger and the native collector have related 
but slightly different requirements. The collector needs the type of all variables 
(including compiler introduced temporaries) but not their names, whereas the 
debugger needs names but does not need information about temporaries. If both 
are enabled, the label layout structure will contain the union of the information 
the two systems need. 

The debugger and native collector are also interested in somewhat different 
sets of labels. While both are interested in return labels, the debugger is also 
interested in labels representing entry to and exit from the procedure, and labels 
at program points that record decisions about the path execution, e.g. the entry 
points into the then parts and else parts of if-then-elses; these are irrelevant for 
the native collector. 

Each live, interesting variable at the label has an entry in two consecutive 
vectors pointed to by the label’s layout structure. The entry in one of the vectors 
gives the location of the variable. Some bits in this entry say whether the variable 
is in an abstract machine register, in a slot on the det stack, or in a slot on the 
nondet stack, while the other bits give the number of the register or the offset 
of the slot. The entry in the other vector is the pseudo_type_inf o for the type 
of the variable. Before this pseudo_type_inf o can be used to interpret the value 
of the variable, it must be converted into a type_info by substituting, for every 
type variable in the pseudo_type_inf o, the type_info of the actual type bound 
to the type variable. 

Consider a polymorphic predicate, one of whose argument is of type list (T) . 
Its caller will pass an extra argument giving the type_info of the actual type 
bound to T; this is the type_info that must be substituted into the 
pseudo_type_inf o of the list(T) argument. Since the signature of the pro- 
cedure may include more than one type variable, each of which will have the 
actual type bound to it specified by an extra type_info argument, the compiler 
assigns consecutive integers, starting at 1, to all the type variables that occur 
in the types of any of the arguments (actually, to all the type variables that 
occur in the types of any of the variables of the procedure, which includes the 
arguments), and makes the pseudo_type_inf os in the vector of pairs refer to 
each type variable by its assigned number. For every label that has a label lay- 
out structure, the compiler takes the set of live, interesting variables, and finds 
the set of type variables that occur in their types. The compiler then includes a 
description of the location of the type_inf o structure for the actual type bound 
to the type parameter in the type parameter location vector of the label layout 
structure, at the index given by the number assigned to the type variable. 

That may sound complex, but to look up the type of a variable, one need only 
(a) convert the vector of type parameter locations in the label layout structure 
into an equal sized vector of type_infos, by decoding each location descrip- 
tor and looking up the value stored at the indicated location, which will be 
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a type_info, and (b) using the resulting vector of type_infos to convert the 
pseudo_type_inf o for the variable into the type_info describing its type, ex- 
actly as we did in section 3.5 (except that the source of the type_info vector 
that identifies the types bound to the various type variables is now different). 

One consequence of including information about the locations of variables 
that hold type_infos in label layout structures is that the compiler must ensure 
that a variable that holds the type_info describing the actual type bound to 
a given type variable must be live at a label that has a layout structure if any 
variable whose type includes that type variable is live at that label. Normally, 
the compiler considers every variable dead after its last use. However, when the 
options call for the generation of label layouts, the compiler, as a conservative 
approximation, considers a variable that holds the type_inf o to be live whenever 
any variable whose type includes that corresponding type variable is live. This 
rule of typeinfo liveness often extends the life of variables containing type_inf os, 
and sometimes prevents such variables from being optimized away. 

4 Evaluation 

Hard numbers on RTTI systems are rare: there are few papers on RTTI, and 
many of these papers do not have performance evaluations of the RTTI system 
itself (although they often evaluate some feature enabled by RTTI). In this 
section we therefore provide some such numbers. 

The Mercury implementation depends on RTTI in very basic ways. We can- 
not just turn off RTTI and measure the speed of the resulting system, because 
without RTTI, polymorphic predicates do not know how to perform unification 
and comparison. We would have to remove all polymorphism from the program 
first. This would require a significant amount of development effort, particu- 
larly since many language primitives implemented in C cannot be specialized 
automatically. 

We therefore cannot report results on the exact time cost of the RTTI sys- 
tem. We can report two kinds of numbers though. First, a visual inspection of 
the C code generated by the Mercury compiler leads us to estimate that the frac- 
tion of the time that a Mercury program spends constructing type_infos and 
moving them around (all the other RTTI data structures are defined statically) 
is usually between 0 to 8%, probably averaging 1 to 3%. In earlier work [12], we 
measured this cost as being less than 2% for a selection of small benchmarks. 
One reason why these numbers are small is that the Mercury compiler includes 
an optimization that removes unused arguments; most of the arguments thus 
removed are type_info structures. Second, the researchers working on HAL, a 
constraint logic programming language, have run experiments showing the speed 
impact of type-specialized term representations. They took some Prolog bench- 
mark programs, and translated them to HAL in two different ways: once with 
every variable being a member of its natural type, once with every variable being 
a member of a universal type that contained all the function symbols mentioned 
in the program. The HAL implementation, which compiles HAL programs into 
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Mercury, compiles each HAL type into its own Mercury type, so this distinction 
is preserved in the resulting Mercury programs too. The experimental results [6] 
show that the versions using natural types, and therefore type-specific term rep- 
resentations, are on average about 1.4 times the speed of the versions using the 
universal type and thus a generic term representation. Since an implementation 
using a generic term representation can be made to work without RTTI but one 
using type-specific term representations cannot, one could read these results as 
indicating a roughly 40% speed advantage enabled by the use of RTTI. Due to 
the small number and size of the benchmarks involved in the experiments and 
the differences between native Mercury code and Mercury code produced by the 
HAL compiler, this number should be treated with caution. However, it seems 
clear that the time costs of RTTI are outweighed by the benefits it brings in 
enabling type-specific term representations. 

The parts of RTTI that are essential for polymorphism (and which also suffice 
to support dynamic casts, i.e. the univ type) are the type_info structures and 
the parts of type_ctor_inf o structures containing the type constructor arity and 
the addresses of the unification and comparison procedures, i.e. the structures 
discussed up to but not including section 3.5. The structures discussed from 
that point forward, including the type_ctor_layout and type_ctor jfunctors 
structures, are needed only for other aspects of the system, including generic 
term I/O, debugging, native garbage collection, and user-level access to type- 
specialized representations. Since all these structures are defined statically, they 
do not impact the runtimes of programs except through cache effects. 

To get an appreciation for space costs, we have measured the sizes of object 
files generated for the Mercury compiler and standard library, which are written 
in Mercury itself, compiled with various subsets of runtime type information. 
The compiler and library together consist of 225 modules and define about 950 
types and about 7650 predicates; they total about 206,000 lines of code. Our 
measurement platform was an x86 PC running Linux 2.0.36. 

Without any of the static data structures or automatically generated predi- 
cates described in this paper, the object files contain a total of 3666 Kb of code 
and 311 Kb of data. This version of the system cannot do anything that de- 
pends on RTTI. The automatically generated unification and comparison pred- 
icates add 462 Kb of code and 12 Kb of data to this; the static type_ctor_inf o 
structures add another 120 Kb of code and 47 Kb of data on top of that. 
(type_ctor_inf os contain the addresses of unification and comparison proce- 
dures, which prevents the compiler from optimizing those procedures away even 
if they are otherwise unused; this is where the code size increase comes from.) 
This version of the system supports polymorphic unifications and comparisons, 
dynamic typing (with the univ type) and the type_name/ 1 function. 

To support other RTTI-dependent operations, e.g. generic I/O, the sys- 
tem needs the type_ctor_layout and type_ctor jfunctors structures as well. 
Adding the type_ctor_layout structures alone adds 81 Kb of data, while adding 
the type_ctor_functors structures alone adds 78 Kb of data. However, since 



240 Tyson Dowd et al. 



these two kinds of structures share many of their components (e.g. functor de- 
scriptors), adding both adds only 99 Kb of data. 

If we wish to do something using stack layouts, the compiler must follow the 
rule of typeinfo liveness. This rule by itself adds 14 Kb of code and 17 Kb of data. 
This increase comes from the requirement to save type variables on the stack 
and to load them into registers more often (this must cause a slight slowdown, 
but this slowdown is so small that we cannot measure it). This brings us up to 
4247 Kb of code and 469 Kb of data, for a total size of 4747 Kb. We will use 
this as the baseline for the percentage figures below. 

Adding the stack layouts themselves has a much more substantial cost. 
Switching from conservative gc to native gc increases code size by 822 Kb and 
data size by 2229 Kb; total system size increases by 64% to 7798 Kb. The increase 
in code size is due to the native collector’s requirement that certain optimiza- 
tions which usually reduce code size be turned off; the increase in data size is 
due to the label layout structures. Sticking with conservative gc but adding full 
debugging support, increases code size by 6438 Kb and data size by 5248 Kb; 
total system size increases by 246% to 16433 Kb. The code size increase is much 
bigger because debugging inserts into the code many calls to the debugger entry 
point [9] (it also turns off optimizations that could confuse the user). The data 
size increase is much bigger because debugging needs label layout structures at 
more program points (e.g. the entry point of the then part of an if-then-else) , 
and because it needs the names of variables. 

We already use several techniques for reducing the size of the static structures 
generated by the Mercury compiler, most of which are related to RTTI. The most 
important such technique we have not covered earlier in the paper is looking 
for identical static structures in each module and merging them into a single 
structure. Merging identical static structures in different modules would yield 
a further benefit, but since we want to retain separate compilation, it would 
require significant extensions to our compilation environment. Another potential 
optimization we could implement is merging two structures whenever one is a 
prefix of the other. 



5 Related work 

We expect that techniques and data structures at least somewhat similar to the 
ones we have described have been and/or are being used in the implementations 
of other mostly-statically typed languages (e.g. SML, Haskell and Algol 68; see 
the references cited in [8]). However, it is difficult to be sure, since papers that 
discuss RTTI implementations at any significant level of detail are few and far 
between. The exceptions we know of all deal with garbage collection of strongly 
typed languages, using the (obvious) model of walking the stack, finding out what 
the types of the live variables are in each frame and then recursively marking 
their values. 

Goldberg [8] describes a system, apparently never implemented, that asso- 
ciates garbage collection information with each return point; this information 
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takes the form of a compiler-generated function for tracing all the live vari- 
ables of the stack frame being returned to. To handle polymorphism, it has 
these functions pass among themselves the addresses of other functions that 
trace single values (e.g. a function for tracing a list would take as an argu- 
ment a function for tracing the list elements). This is a less general solution 
than our pseudo_type_inf os. The garbage collection and tracing functions are 
single-purpose and are likely to be much bigger than our layout structures. 

Tolmach [14] describes how, by using explicit lazily computed type parame- 
ters that describe the type environment (set of type bindings) of a function, one 
can simplify the reconstruction of types. 

The TIL compiler for SML [13] uses a similar scheme but eagerly evaluates 
type parameters, making it quite similar to the combination of tables and type- 
info parameters used by the Mercury compiler, except for TIL’s use of type tags 
on heap-allocated data. Unfortunately the paper lacks a detailed description of 
the data representations and runtime behaviour of the type information gener- 
ated by the TIL compiler, and is unclear about whether this information can be 
used for purposes other than garbage collection. 

Aditya et al [3,2] describe a garbage collector and debugger for Id that has an 
approach to RTTI that is similar to ours, the main difference being that in their 
system, callers of polymorphic functions do not pass type information to the 
callee; instead, the garbage collector or debugger searches ancestor stack frames 
for type information when necessary. Although this scheme avoids the cost of 
passing type information around, we have not found this cost to be significant. 
On the other hand, the numbers in [3] show that propagating type information 
in the stack is quite expensive for polymorphic code. This is probably not the 
right tradeoff for Mercury, since we want to encourage programmers to write 
polymorphic code. 

In the logic programming field, Kwon et al [11] and Beierle et al [4] both 
describe schemes for implementing polymorphically typed logic programming 
languages with dynamic type tests. Their schemes both extend the WAM; both 
annotate the representations of unbound variable with type information and add 
additional WAM instructions for handling typed unification. But we believe that 
an approach which is based on a high-level program transformation, like our han- 
dling of type_inf os, is simpler than one which requires significant modifications 
to the underlying virtual machine. Neither scheme makes use of type-specific 
term representations. 

None of these papers cited above give measurements of the storage costs 
of their schemes. Future comparisons of space usage and type reconstruction 
performance between Mercury and the systems described in those papers may 
yield interesting results. 

Elsman [7] uses a transformation very similar to the one we use to introduce 
type_infos, for a very similar purpose: to enable type-specific data representa- 
tions; the performance benefits he reports broadly match our experience. How- 
ever, his system, whose purpose is the efficient handling of polymorphic equality 
types in ML, only passes around the addresses of equality functions, not compari- 
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son functions, type names, or layout information. As such, his system constitutes 
a very limited form of RTTI that is useful only for polymorphic equality, not for 
dynamic types, garbage collection, debugging, etc. 



6 Conclusion 

Our results show that a run time type information system can be added to 
Mercury without compromising the speed of the basic execution mechanism, 
and with relatively small space overheads in most cases. The RTTI system allows 
many useful extensions both to the language and to the implementation. 

In future work, we would like to explore the tradeoffs between table-driven 
generic operations and specialized code. Trends in microprocessor design, in par- 
ticular the increasing relative costs of mispredicted branches and cache misses, 
mean that it is quite possible that a generic unification routine using 
type_ctor_inf o structures may now be faster than the automatically gener- 
ated procedures we now use. However, executing code is inherently more flexible 
than interpreting fixed-format tables. At the moment, we take advantage of this 
in our implementation of types with user-defined equality theories, for which we 
simply override the pointer to the automatically generated unification procedure 
with a pointer to the one provided by the user. Such a facility would still need 
to be provided even in a system that used table-driven unification. 

To make table-driven generic operations more competitive, we are in the pro- 
cess of simplifying our data structures. At the moment, the information about 
what kind of type the type constructor represents (a discriminated union type, 
an equivalence type, a no_tag type, an enumeration type or a builtin type) is scat- 
tered in several different parts of the type_ctor_layout structure and its compo- 
nents (e.g. functor descriptors), even though this information is available directly 
in the type_ctor Junctors structure. The reason for this is that initially, the 
Mercury RTTI system only had type_ctor_layouts; type_ctor_functors were 
added later. The design we are moving towards puts the type kind directly into 
the type_ctor_inf o, and specializes the rest of the type_ctor_inf o according 
to the type kind (e.g. type_ctor_layout and type_ctor Junctors structures 
will be present only if the type is a discriminated union type) . 

We would like to thank the Australian Research Council for their support. 
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Abstract. Despite extensive theoretical work on process-calculi, vir- 
tual machine specifications and implementations of actual computational 
models are still scarce. 

This paper presents a virtual machine for a strongly typed, polymor- 
phic, concurrent, object-oriented programming language based on the 
TyCO process calculns. The system runs byte-code files, assembled from 
an intermediate assembly language representation, which is in turn gen- 
erated by a compiler. Code optimizations are provided by the compiler 
coupled with a type- inference system. The design and implementation of 
the virtual machine focuses on performance, compactness, and architec- 
ture independence with a view to mobile computing. The assembly code 
emphasizes readability and efficient byte code generation. The byte code 
has a simple layout and is a compromise between size and performance. 
We present some performance results and compare them to other lan- 
guages such as Piet, Oz, and JoCaml. 

Keywords: Process-Calculus, Concurrency, Abstract-Machine, Imple- 
mentation. 



1 Introduction 

In recent years researchers have devoted a great effort in providing semantics 
for pure concurrent programming languages within the realm of process-calculi. 
Milner, Parrow and Walker’s 7r-calculus or an equivalent asynchronous formu- 
lation due to Honda and Tokoro has been the starting point for most of these 
attempts [9,17]. 

In this paper we use Vasconcelos’ Typed Concurrent Objects to define TyCO, 
a strongly typed, polymorphic, concurrent, object-oriented language [23,25]. 
Typed Concurrent Objects is a form of the asynchronous 7r-calculus featuring 
first class objects, asynchronous messages, and template definitions. The calcu- 
lus formally describes the concurrent interaction of ephemeral objects through 
asynchronous communication. Synchronous communication can be implemented 
with continuations. Templates are specifications of processes abstracted on a se- 
quence of variables allowing, for example, for classes to be modeled. Unbounded 
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behavior is modeled through explicit instantiation of recursive templates. A type 
system assigns monomorphic types to variables and polymorphic types to tem- 
plate variables [25]. Other type systems have been proposed that support non- 
uniform object interfaces [20]. The calculus is reminiscent of the Abadi and 
Cardelli’s c-calculus in the sense that objects are sums of labeled methods at- 
tached to names, the self parameters, and messages can be seen as asynchronous 
method invocations [3]. 

TyCO is a very low-level programming language with a few derived constructs 
and constitutes a building block for higher level idioms. We are interested in 
using TyCO to study the issues involved in the design and implementation of 
languages with run-time support for distribution and code mobility. In this paper 
we focus on the architecture and implementation of a sequential run-time system 
for TyCO. Introducing distribution and mobility is the focus of a cooperating 
project [24]. Our long term objectives led us to the following design principles: 

1 . the system should have a compact implementation and be self-contained; 

2. it should be efficient, running close to languages such as Piet [19], Oz [16] or 
JoCaml [2]; 

3. the executable programs must to have a compact, architecture independent, 
format. 

The architecture of the run-time system is a compact byte-code emulator with 
a heap for dynamic data-structures, a run-queue for fair scheduling of byte-code 
and two stacks for keeping local variable bindings and for evaluating expressions. 
Our previous experience in parallel computing makes us believe that more com- 
pact designs are better suited for concurrent object-oriented languages, whether 
we want to explore local and typically very fine grained parallelism or concur- 
rency, or evolve to mobile computations over fast heterogeneous networks where 
the latencies must be kept to the lowest possible. 

The remainder of the paper is organized as follows: section 2 introduces the 
TyCO language; sections 3 describes the design and some implementation de- 
tails of the run-time system; section 4 describes the optimizations implemented 
in the current implementation; section 5 presents some performance figures ob- 
tained with the current implementation, and finally; sections 6 and 7, respec- 
tively overview some related work, present some conclusions and future research 
issues. 

2 Introducing TyCO 

TyCO is a strongly, implicitly typed concurrent object-oriented programming 
language based on a predicative polymorphic calculus of objects [23,25]. TyCO 
is a kernel language for the calculus, and grows from it by adding primitive types 
with a set of basic operations, and a rudimentary I/O system. 

In the sequel we introduce the syntax and semantics of TyCO. The discussion 
is much abbreviated due to space constraints. For a fully detailed description the 
reader may refer to the language definition [23]. 
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Syntax Overview The basic syntactic categories are: constants (booleans and 
integers), ranged over by c, c', . • ■ ; value variables, ranged over by x,y, . . .] ex- 
pressions, ranged over by e, e', . • ■ ; labels, ranged over by 1,1' , , and; template 
variables, ranged over by X,Y, . . . . Let x denote the sequence x\ - • • Xk, with 
fc > 0, of pairwise distinct variables (and similarly for e where the expressions 
need not be distinct). Then the set of processes, ranged hy P,Q, . . . is given by 
the following grammar. 



P 



D 

M 

e 



inaction 
P I P 
new X P 
X ! l[e] 

X 1 M 

m 

def D ±n P 

if e then P else P 

(P) 

Xi{xi) = P\ and . . . and Xk{xk) 
(^l(xi) — Pi , . . . , lk(.Xk) — Pk\ 
ei op €2 \ op e \ X \ c \ (e) 



terminated process 
concurrent composition 
channel declaration 
message 
object 
instance 
recursion 
conditional 
grouping 

Pk template declaration 
methods 
expressions 



Some restrictions apply to the above grammar, namely: a) no collection of 
methods may contain the same label twice; b) no sequence of variables in a 
template declaration or collection of methods may contain the same variable 
twice, and; c) no declaration may contain the same template variable twice. 

A method labeled k in an object xl{li{xi) = Pi,. . . , lk{xk) = Pk} is selected 
by a message of the form x\li[e\-, the result is the process Pi where the variables 
in Xi are replaced by the values of the expressions in e. This form of reduction 
is called communication. Similarly, an instance Xi[e] selects the template Xi in 
a template declaration Xi{xi) = Pi and. . . and Xk{xk) = Pk', the result is the 
process Pi where the variables in Xi are replaced by the values of the expressions 
in e. This form of reduction is called instantiation. 

We let the scope of variables, introduced with new, extend as far to the right 
as possible, i.e., up to the end of the current composition of processes. We single 
out a label — val — to be used in objects with a single method. This allows us 
to abbreviate the syntax of messages and objects. Single branch conditionals are 
also defined from common conditionals with an inaction in the else branch. 

x![e] = x!val[e] 

x7{y) = P = x?{val(y) = P} 

if e then P = if e then P else inaction 



To illustrate the programming style and syntax of the language we sketch a 
simple example: a single element polymorphic cell. We define a template object 
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with two attributes: the self channel and the value u itself. The object has two 
methods one for reading the current cell value and another to change it. The 
recursion keeps the cell alive. 

def Cell( self, u ) = 
self ? { 

read( r ) = r ! [u] I Cell [self, u] , 
write ( V ) = Cell [self, v] 

I 

in new x Cell [x, 9] I new y Cell [y, true] 

The continuation of the definition instantiates an integer cell and a boolean 
cell at the channels x and y, respectively. 

3 The Virtual Machine 

The implementation of the virtual machine is supported by a formal specification 
of an abstract machine for TyCO[15]. This abstract machine grows from Turner’s 
abstract machine for Piet [22], but modifies it in the following major ways: 

1. objects are first class entities and substitute input processes. Objects are 
more efficient than Piet’s encoding in tt [26] both in reduction and heap 
usage; 

2. we use recursion instead of replication for persistence. This allows a cleaner 
design of the abstract machine - no need for distinct ? and ?* rules, and 
allows a more rational heap usage; 

3. we introduce a new syntactic category - the thread - that represents the 
basic schedulable and runnable block in the abstract machine. Threads are 
identified as bodies of template definitions or method implementations; 

4. threads cannot be suspended. With this property, our objects are very akin 
to actors and provide a good model for object oriented concurrent lan- 
guages [4,5]. This choice, along with the previous item, also simplifies the 
treatment of local bindings, introduced with new statements, and the man- 
agement of environments. 

The abstract machine is sound, i.e., every state transition in the abstract 
machine can be viewed as a reduction or a congruence between their process 
encodings in the base calculus [15]. It also features important run-time properties 
such as: a) at any time during a computation the queues associated with names 
are either empty or either have communications or method-closures [22]; b) for 
well-typed programs the abstract machine does not deadlock. This property is 
linked intimately to the ability of the type system to guarantee that no run-time 
protocol errors will occur, and; c) the machine is fair, in the sense that every 
runnable thread will be executed in a constant time after its creation. 

The virtual machine closely maps the formal specification and executes TyCO 
programs quite efficiently. 
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Fig. 1. Memory Layout of the Virtual Machine 



The Memory Layout The virtual machine uses five logically distinct memory 
areas (figured) to compute. 

The program area keeps the byte-code instructions to be executed. The byte- 
code is composed of instruction blocks and method tables (sequences of pointers 
to byte-code blocks). 

Dynamic data-structures such as objects, messages, channels and builtin val- 
ues are allocated in the heap. The basic building block of the heap is a machine 
word. The basic allocation unit in the heap is the frame and consists of one or 
more contiguous heap words with a descriptor for garbage collection. 

When a reduction (either communication or instantiation) occurs, a new 
virtual machine thread (vm_thread) is created. The new vm_thread is simply a 
frame with a pointer to the byte-code and a set of bindings, and is allocated in 
the run-queue where it waits to be scheduled for execution. Using a run-queue 
to store vm_threads ready for execution provides fairness. The heap and the 
run-queue are allocated in the bottommost and topmost areas, respectively, of 
a single memory block. They grow in opposite directions and garbage collection 
is triggered when a collision is predicted. 

Local variables, introduced with new statements, are bound to fresh channels 
allocated in the heap and the bindings (pointers) are kept in the channel stack. 
These bindings are discarded after a vm_thread finishes but the channels, in the 
heap, may remain active outside the scope of the current vm_thread through 
scope extrusion. 

Finally, expressions with builtin data-types are evaluated in the operand 
stack. Simple values do not require evaluation and are copied directly in the 
heap. Using an operand stack to perform built-in operations enables the genera- 
tion of more compact byte-codes since many otherwise explicit arguments (e.g., 
registers for the arguments and result of an operation) are implicitly located at 
the top of the stack. 

Heap Representation of Processes and Channels TyCO manipulates 
three basic kinds of processes at runtime: messages, objects and instantiations 
(figure 2). Messages and objects are located in shared communication channels. 
Internally, the virtual machine sees all these abstractions as simple frames, al- 
though their internal structure is distinct. A message frame holds the label of 
the method it is invoking plus a variable number of arguments. An object frame. 
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on the other hand, holds a pointer to the byte-code (the location of its method 
table) plus a variable number of bindings for variables occurring free in its meth- 
ods. An instance frame has a pointer to the byte-code for the template and a 
variable number of arguments. 




message frame object frame instance frame 



Fig. 2. Message, object, instance and channel frames. 



Channel frames hold communication queues which at run-time have either 
only objects or only messages or are empty. The first word of the three that 
compose a channel is the descriptor for the frame which in this case also carries 
the state of the channel. This state indicates the internal configuration and 
composition of the queue. The other two words hold pointers to, respectively, 
the first and last message (object) frame in the queue. 

The Machine Registers The virtual machine uses a small set of global reg- 
isters to control the program flow and to manipulate machine and user data- 
structures. Register PC (Program Counter) points to the next instruction to be 
executed. Register HP (Heap Pointer) points to the next available position in the 
heap. Registers SQ (Start Queue) and EQ (End Queue) point to the limits of the 
run-queue. Finally, registers OS (Operand Stack) and CS (Channel Stack) point 
to the last used position in each area. 

When a program starts, register PC is loaded with the address of the first 
instruction. Register CC (Current Channel) points to the channel which is cur- 
rently being used to try a reduction. Register CF (Current Frame) holds frames 
temporarily until they are either enqueued or used in a reduction. If a reduction 
takes place, the frame for the other redex’s component is kept in the register OF 
(Other Frame). Registers FV (Free Variable bindings) and PM (ParaMeter bind- 
ings) are used to hold the free variable and parameter bindings, respectively. 

The Instruction Set The virtual machine instruction set was intentionally 
designed to be minimal in size and to have a very simple layout. Instructions 
are identified by an unique opcode held in the word pointed to by the program 
counter PC. For most instructions the opcode of an instruction determines the 
number of arguments that follow in contiguous words. Alternatively, the first 
argument indicates the number of remaining arguments as in switch. In the 
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sequel we let n,k range over the natural numbers, I over code labels and w over 
machine words (representing constants or heap references). 

The first set of instructions is used to allocate heap space for dynamic data- 
structures. 

msgf n,k objf n,l instf n newc k 

msgf n,k allocates a frame for a message with label k and with n words for 
the arguments, objf n,l does the same for an object with a method table at I 
and n words for free variable bindings, instf n allocates a frame for a template 
instance with n words for arguments, newc k allocates a channel in the heap and 
keeps a reference for it in the stack position k. 

Next we have instructions that move data, one word at a time, within the 
heap and between the heap and the operand and channel stacks. 

put k, w push w pop k 

put k, w copies the word w directly to the position k in the frame currently 
being assembled, push w pushes the data in w to the top of the operand stack, 
pop k moves the result of evaluating an expression from the top of the operand 
stack to the position k in the frame currently being assembled. 

We also need the following basic control flow instructions. 

if I switch n,Zi, ,... jump Z ret 

if I jumps to label I if the value at the top of the operand stack is (boolean) 
false, switch n,li, , . . . ,ln jumps to label Ik, where k is taken from the top of 
the operand stack, jump I jumps unconditionally to the code at label 1. Finally, 
ret checks the halt condition and exits if it holds; otherwise it loads another 
vm_thread for execution from the run-queue. 

For communication queues we need instructions to check and update their 
state and insert and remove items from the queues. 

state w update k reset enqueue dequeue 

state w takes the state of the channel in word w and places it at the top 
of the operand stack, update k changes the state of the current channel to k. 
In a unoptimized setting the state of a channel is 0 if it is empty, 1 if it has 
messages or 2 if it has objects, reset sets the state to 0 if the current channel 
is empty, enqueue enqueues the current frame (at CF) in the current channel (at 
CC) whereas dequeue dequeues a frame from the current channel (at CC) and 
prepares it for reduction (placing it at OF). 

Finally, we require instructions that handle reductions: both communication 
and instantiation. 

redobj w redmsg w instof I 

redobj w reduces the current object frame with a message, redmsg w is similar, 
reducing the current message frame with an object. Finally, instof I creates a 
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new vm_thread, an instance of the byte-code at label I, with the arguments in 
the current frame. 

In addition to this basic set there is a set of operations on builtin data- 
types and also specialized instructions that implement some optimizations. For 
example, in case a sequence of words is copied preserving its order, the put and 
pop operations may be replaced by optimized versions where the k argument is 
dropped. 

The Emulator Before running a byte-code file, the emulator links the opcodes 
in the byte-code to the actual addresses of the instructions (C functions). It also 
translates integer offsets within the byte-code into absolute hardware addresses. 
This link operation avoids an extra indirection each time a new instruction is 
fetched and also avoids the computation of addresses on-the-fly. The emulator 
loop is a very small while loop adapted from the STG machine [11]. Each 
instruction is implemented as a parameterless C function that returns the address 
of the following instruction. The emulation loop starts with the first instruction 
in the byte-code and ends when a NULL pointer is returned. The emulator halts 
whenever a vm_thread ends and the run-queue is empty. 

Garbage Collection A major concern of the implementation is to make the 
emulator run programs in as small a heap space as possible. Efficient garbage 
collection is essential for such a goal. The emulator triggers garbage collection 
whenever the gap between the top of the heap, pointed to by HP, and the end of 
the run-queue, pointed to by EQ, is smaller than the required number of words 
to execute a new vm_thread. 

Making this test for every instruction that uses the heap before actually 
executing it would be very costly. Instead, when a byte-code file is assembled the 
maximum number of heap words required for the execution of each vm_thread 
is computed and placed in the word immediately preceding the first instruction 
of the vm_thread. At run-time, before starting executing a new vm_thread, the 
emulator checks whether there is enough space in the heap to safely run it. If not 
then garbage collection is triggered. The emulator aborts if the space reclaimed 
by the garbage collector is less than required. 

If the space between the heap limit and SQ is enough, then the garbage 
collector just shifts the run-queue upwards and returns; if not, it must perform 
a full garbage collection. We use a copying garbage collection algorithm. The 
algorithm performs one pass through the run-queue to copy the active frames. 
Active frames are those that can still be accessed by taking each item in the 
run-queue as the root of a tree and recursively following the links in the heap 
frames. The garbage collector does not use any knowledge about the internal 
structure of the frames (e.g., if they represent objects or messages). 

Compilation TyCO programs are compiled into the virtual machine instruction 
set by the language compiler. This instruction set maps almost one-to-one with 
the byte-code representation. The syntax of the intermediate representation re- 
flects exactly the way the corresponding byte-code is structured. It is important 
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that the compilation preserves the nested structure of the source program in the 
final byte-code. This provides a very efficient way of extracting byte-code blocks 
at run-time when considering code mobility in distributed computations [14,24]. 
To illustrate the compilation we present a skeletal version of the unoptimized, 
intermediate code for the Cell example presented in section 2. The run-time 
environment of a vm_thread is distributed into three distinct locations: the pa- 
rameter and free variable bindings pointed to by registers PM and FV, respectively, 
and the bindings for local variables held in the channel stack CS. In the machine 
instructions the words PM[fc], FV[/c] and CS[fc] are represented by pfc, fk and cfc, 
respectively. 



main = { 



} 



def Cell 


= { 




objf 


2 


y, self located object with parameters 


put 


pO 


y, pO=self 


put 


pi 


y, pl=u 


trobj 


pO = { 





{ read, write } 



read = { 




y, the method ’read’, p0=r ,f0=self ,f l=u 


msgf 


1,0 


y, message r ! [u] 


put 


f 1 




trmsg 


pO 




instof 


2, Cell 


y, instantiation Cell [self ,u] 


put 


fO 




put 

i 


fl 




j 

write = { 




y. the method ’write’, p0=v,f0=self ,f l=u 


instof 


2, Cell 


y, instantiation Cell [self, v] 


put 


fO 




put 


pO 





} 

} 

} 



newc 


cO 


y, creation of x 




instof 


2, Cell 


y, instantiation 


Cell [x, 9] 


put 


cO 






put 


9 






newc 


cl 


y, creation of y 




instof 


2, Cell 


y, instantiation 


Cell [y ,true] 


put 


cl 






put 


true 







4 Optimizations 

This section describes a sequence of optimizations that we applied to the emula- 
tor with support from the compiler to improve performance. Some optimizations 
rely on type information, namely channel usage properties, gathered at compile 
time by the type inference system [6,12,13,23]. 
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Compacted messages and objects This optimization is due to Turner [22]. 
Given that at any time a channel is very likely to hold at most a single object 
or message frame in its queue before a communication takes place, we optimize 
the channel layout for this case. The idea is to avoid the overhead of queuing 
and dequeuing frames and instead to access the frame contents directly. 

Optimizing frame sizes We minimize the size of the frames required by each 
process frame. For example, messages or objects in a synchronization channel 
do not require a next field. This minimizes heap consumption and improves 
performance. 

Single method objects These objects do not require a method table. The 
compiler generates the code for an object with an offset for the byte-code of the 
method, instead of an offset for a method table. This avoids one extra indirection 
before each method invocation. 

Fast reduction This optimization can be performed in cases where we can as- 
sure that a given channel will have exactly one object. Two important cases are: 
uniform receptors [21] where a persistent object is placed in a channel and re- 
ceives an arbitrary number of messages, and; linear synchronization channels [13] 
where an ephemeral object and a message meet once in a channel for synchro- 
nization and the arrival order is unknown. The main point of this optimization 
is that we never allocate the channel in the heap to hold the object (figure 3a). 
We just create a frame for the object in the heap and use its pointer directly as 
a binding (figure 3b) . 





object channel parameter 

frame frame 



obj ect 
frame 



parameter 

frame 



(a) 



(b) 



Fig. 3. Fast reduction 



In the case of uniform receptors, when a message arrives for this binding (p2 
in the figure) we can reduce at once since the binding already holds a pointer to 
the object frame. For linear channels, if we have, say, a message for p2, we first 
check the value at p2. If it is null we assign it the pointer for the message frame, 
otherwise the binding must point to an object frame and reduction is immediate. 
What distinguishes persistent from linear synchronization channels in our model 
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is the fact that, since we use recursion to model persistence, a persistent object 
must always have a self referencing pointer in its closure when it goes to the run- 
queue. This preserves the frame if garbage collection is triggered. In the case of 
a linear channel this link cannot exist and so, the object frame only lasts until 
the resulting vm_thread ends. 

Merging instructions Certain instructions occur in patterns that are very 
common, sometimes pervasive, in the intermediate assembly. Since an 
instruction-by-instruction execution is expensive (one unconditional jump per 
instruction) we create new instructions that merge a few basic ones to optimize 
for the common case. One such optimization is shown for macros trobj and 
trmsg, used to try to reduce objects and messages immediately. For example, 
trobj w checks the state of channel w. If it has messages then it dequeues a 
message frame and creates a new vm_thread in the run-queue from the reduc- 
tion with the current object. If the channel state is either empty or already has 
objects the current frame is enqueued. The case for trmsg w is the dual. 



trobj w: 

state 

switch 

empty : enqueue 
update 
jump 

msg : dequeue 

redobj 
reset 
jump 

obj : enqueue 

end: 



w 

3 , empty , msg , obj 
2 

end 

w 

end 



trmsg w: 





state 




switch 


empty : 


enqueue 

update 

jump 


msg: 


enqueue 

jump 


obj : 


dequeue 

redmsg 

reset 


end: 





w 

3 , empty , msg , obj 
1 

end 

end 

w 



Inline arguments in the run-queue Each time a template instantiation or 
a fast reduction occurs we copy the arguments directly to the run-queue. The 
advantage of the optimization is that the space used for vm_threads in the run- 
queue is reclaimed immediately after the vm_thread is selected for execution. 
This increases the number of fast (and decreases the number of full) garbage 
collections. Fast garbage collections are very light weight involving just a shift 
of the run-queue. 



5 System Performance 

Currently, we have an implementation of the TyCO programming language 
which includes a source to assembly compiler and a byte-code assembler. We 
have chosen to separate the intermediate assembly code generation from the 
byte-code generation to allow us more flexibility namely in programming directly 
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in assembly. The emulator is a very compact implementation of the virtual ma- 
chine with about 4000 lines of C code. Figure 4 illustrates the architecture of 
the TyCO system. 



TyCO program 




output 



Fig. 4. The TyCO system 



We used a set of small programs to measure the efficiency of the current im- 
plementation relative to the concurrent programming languages Piet [19], Oz [16] 
and the JoCaml implementation of the Join calculus [8]. Both JoCaml and Oz 
may use byte-codes whereas Piet generates binary files from intermediate C code. 

The values presented in table 1 are just indicative of the system’s performance 
in terms of speed. These helped us in fine-tuning the implementation. A full 
performance evaluation will have to address larger, real world applications [18]. 

The benchmark programs used include some standard programs such as tak, 
sieve and queens, and three other larger programs, mirror, graph and f ourier, 
that best illustrate the potential of the language, mirror takes a tree with 10k 
nodes, leafs and buds, and builds another one which is its mirror image. It uses 
objects for pattern matching and is deeply recursive, graph takes a connected 
graph with 128 nodes and traverses it mapping a function on each node’s at- 
tribute. Each node of the graph is an object with an integer attribute. The 
computation time for each node is exponential on the integer attribute, fourier 
takes a list of complex numbers (implemented as objects) and computes its 
Discrete Fourier Transform. We have implemented all programs in TyCO 0.2, 
Piet 4.1 [19], Oz 2.0.4 [16] and JoCaml [2] to compare the performance of these 
systems. 

Table 1 shows, for each program, the smallest execution time of 10 consecu- 
tive runs in seconds. All the values observed were obtained with a default heap 
space of 256k words (when possible), in a 233MHz Pentium II machine with 
128Mb RAM, running Linux. With Oz we used the switches +optimize and 
-threadedqueries to get full performance from the Oz code. Both Piet and the 
TyCO emulator were compiled with -03 -fommit-frame-pointer optimization 
flags. 

The initial results show that TyCO’s speed is clearly in the same order of 
magnitude as Piet and Oz, and indeed compares favorably considering it is em- 
ulated code. TyCO is clearly faster than JoCaml as can be seen in the rightmost 
portion of the table. These results were obtained for distinct problem sizes, rel- 
ative to the first set, since for some programs JoCaml ran into some problems 
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Program 


Piet 


Oz 


TyCO 


Program 


JoCaml 


TyCO 


tak*10 22,16,8 


4.40 


11.7 


24.31 


tak 22,16,8 


6.98 


3.02 


queens*10 10 


17.0 


33.1 


58.09 


queens 8 


2.46 


0.39 


sieve*10 10k 


9.90 


19.2 


20.92 


Isieve 4k 


62.37 


8.33 


mirror*10 10k 


3.13 


4.20 


1.21 


mirror 10k 


2.51 


0.24 


graph* 10 128 


9.62 


7.10 


7.24 


graph 128 


— 


— 


fourier*10 64 


2.30 


2.60 


0.91 


fourier 64 


0.95 


0.21 



Table 1. Execution times 



either compiling (e.g., graph) or running them (e.g., queens 10). Isieve uses 
non-builtin lists to implement the sieve of primes while sieve uses a chain of 
objects. The performance gap is higher than average in the case of functional 
programs a fact that is explained by the optimized code generated by both 
Piet and Oz for functions. Further optimization, namely with information from 
the type system may allow this gap to be diminished. The performance ratio 
for applications that manipulate large numbers of objects (with more than one 
method), on the other hand, clearly favors TyCO. For example mirror, which 
uses objects to implement a large tree and to encode pattern matching, performs 
nearly three times faster than Piet and even more for Oz and JoCaml. Also notice 
that all the Oz programs required an increase in the heap size up to 3M words 
and once (queens) to 6M to terminate. Compare this with the very conservative 
256k used by both in TyCO and Piet. The exception for Piet is fourier where 
there is a lot of parallelism and method invocations. Piet required 1.5M words 
to run the program, as opposed to 256k words in TyCO, and was about 2.3 
times slower than TyCO. fourier shows that objects in both Piet and Oz are 
clearly less efficient than in TyCO. Piet showed lower performance on the object 
based benchmarks. This is due to the fact that the encoding of objects in the 
TT-calculus is rather inefficient both in speed and heap usage. 



Program 


Heap 


TyCO 

shift-gc 


full-gc 


Heap 


Piet 

shift-gc 


full-gc 


tak 22,16,8 


13487 


406 


18 


11496 


225 


32 


queens 10 


12975 


591 


13 


13933 


390 


55 


sieve 10k 


11110 


218 


31 


11941 


253 


36 


mirror 10k 


505 


4 


0 


1094 


27 


5 


graph 128 


3336 


101 


5 


5769 


139 


25 


fourier 64 


538 


18 


1 


4870 


2 


0 



Table 2. Total heap usage and number of garbage collections. 



Table 2 shows that TyCO uses more heap space than Piet in functional 
applications such as tak. The situation changes completely when we switch to 
programs with object based data-structures. TyCO performs more shift garbage 
collections since it uses the run-queue to store the arguments of instances and 
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some messages directly and, on average, TyCO uses one extra word per heap 
frame. This increases the number of collisions with the top of the heap. On the 
other hand the number of full garbage collections performed is substantially 
smaller in TyCO. This is mostly due to the combined effect of the inlining 
of arguments in the run-queue and to the fact that TyCO does not produce 
run-time heap garbage in the form of unused channels or process closures. Piet 
produces significant amounts of heap garbage in applications where objects are 
pervasive since each object is modeled with: a) one channel for the object, one 
channel for each method and one process closure per method. Most of the times 
only a subset of these will actually be used. On the other hand an object in 
TyCO just requires a channel (self) and one closure for the method collection 
and the free variables. This effect is plainly visible in mirror for example. 

We tried to measure the amount of overhead generated by our emulation 
strategy using a small test program that has a similar emulation cycle but always 
invokes the same function. An artificial addition was introduced in the body of 
the function as a way to simulate the overhead of adjusting the program counter 
for the next instruction, as is done in the actual machine. Running this small 
program with the optimizations -03 -f omit-f rame-pointer, we found that the 
emulation overhead accounts for 15 to 28% of the total execution time, with the 
higher limit observed for functional programs. 

The run-time system for TyCO is very lightweight. It uses byte-codes to pro- 
vide a small, architecture independent, representation of programs. The engine 
of the system is a compact (the binary occupies 39k) and efficient emulator with 
light system requirements, namely it features a rather conservative use of the 
heap. 

This system architecture provides in our opinion the ideal starting point 
for the introduction of distribution and code mobility, which is the focus of an 
ongoing project. 



6 Related Work 

We briefly describe the main features of some concurrent process-based program- 
ming languages that relate to our work. 

Piet is a pure concurrent programming language based on the asynchronous 
TT-calculus [19]. The run-time system is based on Turner’s abstract machine spec- 
ification and the implementation borrows from the C and OCaml programming 
languages [22]. The basic programming abstractions are processes and names 
(channels). Objects in Piet are persistent with each method implemented as an 
input process held in a distinct channel. The execution of methods in concurrent 
objects by a client process is achieved by first interacting with a server process 
(that serves requests to the object and acts like a lock ensuring mutual exclu- 
sion), followed by the method invocation proper [26]. This protocol for method 
invocation involves two synchronizations as opposed to one in TyCO that uses 
branching structures. Moreover, this encoding of objects produces large amounts 
of computational garbage in the form of unused channels and process closures. 
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Also, Piet uses replication to model recursion whereas TyCO uses recursion. Re- 
cursion not only provides a more natural programming model but also allows 
replication only when it is strictly needed, avoiding the generation of unused 
process. 

Oz is based on the 7 -calculus and combines the functional, object-oriented 
and constraint logic programming paradigms in a single system [16]. The Oz 
abstract machine is fully self contained and its implementation is inspired in the 
AKL machine [10]. The basic abstractions are names, logical variables, proce- 
dural abstraction and cells. Constraints are built over logical variables by first- 
order logic equations, using a set of predefined predicates. Cells are primitive 
entities that maintain state and provide atomic read- write operations. Channels 
are modeled through first class entities called ports. They are explicitly manipu- 
lated queues that can be shared among threads and can be used for asynchronous 
communication. Oz procedures have encapsulated state so that objects can be 
defined directly as sets of methods (procedures) acting over their state (rep- 
resented as a set of logical variables). This representation is close to objects 
in TyCO if we view the state held in logical variables as template parameter 
bindings. 

Join implements the Join-calculus [7,8]. The JoCaml implementation inte- 
grates Join into the OCaml language. Join collapses the creation of new names, 
reception and replication into a single construct called a join pattern. Channels, 
both synchronous and asynchronous, expressions and processes are the basic ab- 
stractions. Programs are made of processes, communicating asynchronously and 
producing no values, and expressions evaluated synchronously and producing 
values. Processes communicate by sending messages on channels. Join patterns 
describe the way multiple processes (molecules) may interact with each other 
(the reactions) when receiving certain messages (molecules) producing other pro- 
cesses (molecules) plus eventual variable bindings. The JoCaml implementation 
has some fairly advanced tools for modular software development inherited from 
its development language OCaml and supports mobile computing [1]. 

7 Conclusions and Future Work 

We presented a virtual machine for a programming language based on Typed 
Concurrent Objects, a process-calculus [25]. The virtual machine emulates byte- 
code programs generated by a compiler and an assembler. The performance of 
the byte-code is enhanced with optimizations based on type information gathered 
at compile-time. Preliminary results are promising and there is scope for plenty 
of optimizations. The current implementation performs close to Piet and Oz on 
average and clearly surpasses JoCaml. TyCO is faster in applications using ob- 
jects and persistent data structures, despite being emulated. TyCO consistently 
runs in very small heap sizes, and performs significantly less garbage collection 
than either Piet or Oz. 

Future work will focus on performance evaluation and fine tuning of the sys- 
tem using larger, real world applications [18]. Channel usage information from 
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type systems such as those described can dramatically optimize the assembly 
and byte-code [13,12]. An ongoing project is introducing support for mobile 
computing in this framework as proposed in [24]. A multi-threaded/parallel im- 
plementation of the current virtual machine is also being considered since it will 
provide an interesting model for parallel data-flow computations. 

The TyCO system, version 0.2 {alpha release) may be obtained from the web 
site: http : //www . ncc . up . pt/~lblopes/tyco . html. 
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Abstract. Byte-code representation has been used to implement sev- 
eral programming languages such as Lisp, ML, Prolog, or Java. In this 
work, we discuss the impact of several emulator optimisations for the 
Prolog system YAP. YAP obtains performance comparable or exceed- 
ing well-known Prolog systems by applying several different styles of 
optimisations, such as improving the emulation mechanism, exploiting 
the characteristics of the underlying hardware, and improving the ab- 
stract machine itself. We give throughout a detailed performance analy- 
sis, demonstrating that low-level optimisations can have a very significant 
impact on the whole system and across a range of architectures. 



1 Introduction 

Byte-code representation [14,9,20] has been used to implement several program- 
ming languages including Lisp [25], ML [17], Prolog [24], and Java [2]. In this 
technique, the program is first compiled into a lower- level intermediate repre- 
sentation, known as the bytecode of a virtual machine or abstract machine. At 
run-time, an emulator interprets the virtual machine instructions. Emulation 
can be considered an intermediate case between pure compilation and pure in- 
terpretation. As in compilation, one can perform several optimisations to the 
program, benefitting from the lower-level nature of the abstract machine. As in 
interpretation, one avoids the full complexity of native code generation. Also, 
by writing the emulator itself in a portable language such as C, one can easily 
obtain portability between different platforms. 

Prolog is an interesting example of the advantages and disadvantages of em- 
ulators. Most Prolog implementations are based on Warren’s Abstract Machine 
(WAM) [30]. The WAM is a register-based abstract machine, that uses term 
copying to represent terms and environments to represent active clauses. In the 
last few years most research in the area has concentrated on obtaining fast per- 
formance through techniques such as native code generation and through global 
optimisations based on abstract interpretation [29] . Although such techniques do 
improve performance, the resulting systems are harder to maintain, and in fact 
most current Prolog systems are either emulator-based or do support emulators. 

The fact that abstract machines will not go away easily leads to an inter- 
esting question: how fast can we make an abstract machine go? Our experi- 
ence in implementing Prolog has shown that there are a few general issues that 

G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 261-277, 1999. 
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should be of interest to all abstract machine implementations. One always wants 
to use the best representation for the abstract machine. Moreover, one always 
wants to take the best advantage of the underlying hardware. Unfortunately, 
this means different things for different instruction set architectures (IS As), dif- 
ferent implementations of an ISA (pipelined versus super-scalar), and different 
implementation-language compilers. 

How much do these individual factors affect performance, and how do they 
compare with other optimisations? In this work, we discuss the impact of several 
such optimisations for a Prolog implementation. Our work was performed for the 
WAM-based system YAP (“Yet Another Prolog”), a Prolog system developed 
at the University of Porto by Lui's Damas and the author [8]. We evaluate the 
system with a well-known set of small benchmarks proposed by Van Roy [28]. 
This set is quite interesting in that it compares a large set of very different Prolog 
programs. We first compare the performance of YAP with SICStus Prolog, a well- 
known high-performance commercial Prolog system [1,6]. Next, we study three 
different styles of emulator optimisations: improving the emulation mechanism, 
exploiting the characteristics of the underlying hardware, and improving the 
abstract machine itself. We conclude with a general analysis and conclusions. 

2 The YAP System 

Work on YAP started at the Universidade do Porto by Luis Damas in 1984, and 
the author joined a year later. YAP originally consisted of a compiler written 
in C, of an emulator written in m68k assembly code, and a standard predicate 
library written in Prolog and C. Releases were externally available since 1986. 
The system has been widely used. It was commercially available, and is now 
freely available in source distribution [8]. 

The YAAM (Yet Another Abstract Machine) emulator is the core of YAP. As 
most other Prolog systems, YAP is based on David H. D. Warren’s Abstract Ma- 
chine for Prolog, usually known as the WAM [30]. As most other Prolog systems, 
YAP implements several extensions to the WAM, and YAP’s full instruction set 
is called the YAAM [22]. The major differences are in the compilation of unifi- 
cation of compound terms, a different scheme for indexing, and in the allocate 
instruction. Whereas the WAM compiles unification in breadth-first fashion. Yap 
compiles unification in depth- first fashion since its original version [22]. A sec- 
ond contribution is that whereas in the WAM the allocate instruction is at 
the head of the call, the YAAM only performs the allocate just before calling 
the first goal. Last, YAP has a set of specialised instructions to avoid duplicate 
choice-points for the same goal, a problem of the original WAM, and indexes on 
the head of lists [22]. 

The YAAM emulator was initially implemented in m68k assembly. It was 
later ported to several other architectures namely the VAX, SPARC, HP-Prism 
and MIPS. Damas implemented a macro-processor to generate different instruc- 
tions for each architecture. Experience showed that maintaining several architec- 
tures was cumbersome and ultimately inefficient, as it was quite hard to take the 
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best advantage of different instruction sets (both ISAs and their implementations 
change quite significantly in time). Porting to the x86 architecture would also 
force a major redesign of the emulator. These considerations led to using C as the 
implementation language. The techniques we used to obtain high-performance 
in the new emulator are the main subject of this paper. 

A first implementation of the emulator in C showed bad performance. In 
order to obtain maximum performance the emulator was completely rewritten 
in what we named assembly-C. The key idea was that each line in C should be 
easily translatable to very few lines in assembly. In other words, by looking at a 
line in C one should be able to understand the assembly code. This results in a 
very long emulator, but gives quite fine control over compilation quality. Similar 
goals but a different strategy have motivated Peyton Jones’ C — [18] language. 

The current emulator was specifically designed to take best advantage of 
the GNU C compiler (GCC) [23]. This compiler is widely available in a variety of 
platforms, tends to generate bug- free code of reasonable quality, and has support 
for threaded code [3,10]. Throughout the paper we will always use GCC2.7.2 as 
the standard C compiler, in both platforms and for all systems. 

The emulator is implemented as a single function, consisting of 8648 lines 
of C code. The compiled code takes about 42 KB on Linux/x86 and 45 KB on 
Solaris/ SPARC. For the benchmark set we discuss next we would expect that 
interesting instructions should fit comfortably into most primary instruction 
caches. Larger Prolog applications may call external functions, so the working 
set should be analysed case by case (simulations tools such as SimOS [21] and 
SimICS [16] are available for this purpose). 

Evaluating a compiler is very difficult. In the case of Prolog, performance 
depends on data-structure implementation for some applications, and in con- 
trol for others. Performance issues are quite different for lists, trees, or integers. 
Some applications perform heavy search, others are fully deterministic and may 
never backtrack. On the other hand, applications may be deterministic, and still 
perform shallow backtracking. In this work we follow the set of benchmarks pre- 
viously proposed by Van Roy [28] . These are small-to- medium size programs with 
few built-ins. They are quite interesting at stressing the different characteristics 
of the emulator. 

The benchmark consists of 22 programs. The first applications are small 
benchmarks such as the popular naive list reversing benchmark (nreverse), the 
highly recursive Takeuchi function (tak), the quicksort algorithm (qsort), sym- 
bolic derivation (deriv), picking a serial number for an integer (serialise), the fa- 
mous 8-queens problem (queens_8), deducing a formula in Hofstadter’s Mu (mu 
and fast_mu), the zebra puzzle (zebra), and a cryptoarithmetic puzzle (crypt). 
More interesting examples are a meta-evaluation of Prolog for qsort (meta_qsort), 
a simple propositional theorem prover (prover), Gabriel’s browse benchmark 
(browse) [11], a simple Prolog compiler processing unification (unify), disjunc- 
tions (flatten), and optimisation (sdda), and a graph-reducer for t-combinators 
(reducer). Further examples include Boyer’s theorem prover benchmark, origi- 
nally from Gabriel, (boyer), a simple abstract analysis system (simple_analyzer). 
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circuit designer program (nand), and the parser part for the chat-80 natural 
language system. The latter programs may be considered medium-size Prolog 
programs. 



2.1 YAP Performance 



In order to validate our claim that Yap is a high-performance Prolog system, we 
next compare the performance of the optimised version of YAP4.1.10 with SICS- 
tus Prolog release 3 patch 6 [1], a widely available high-performance commercial 
Prolog system. SICStus Prolog supports emulated code through a C-based em- 
ulator, and native code on SPARC, m68k and MIPS platforms [12]. Both the 
SICStus Prolog native code compiler and the threaded-code emulator are con- 
sidered to be high performance, versus other widely available Prolog systems. 
We perform the comparison with both native and threaded code on the SPARC 
platform, and only with threaded code on the x86 platform, as no native-code 
system is available. 



Program 


Yap 


x86 

Sicstus 


Yap 


SPARC 

Sicst 

Native 


1 

US 

Emulated 


nreverse* 50000 


4.43 


7.78 (76%) 


15.5 


3.23 (-383%) 


18.6 (20%) 


tak*20 


1.41 


1.77 (25%) 


2.59 


0.85 ^204%) 


2.86 (10%) 


qsort* 20000 


4.80 


7.54 (57%) 


11.7 


4.06 ^188%) 


12.8 ( 9%) 


deriv* 150000 


8.44 


12.1 (43%) 


20.7 


5.9 ^251%) 


21.1 ( 2%) 


serialise*50000 


9.49 


12.9 (36%) 


20.5 


7.77 ^164%) 


21.4 ( 4%) 


queens_8*5000 


6.91 


9.75 (41%) 


15.6 


4.93 ^216%) 


16.0 ( 3%) 


mu*6000 


1.87 


3.07 (64%) 


4.65 


2.14 (-117%) 


5.55 (20%) 


zebra* 100 


2.07 


2.91 (41%) 


3.42 


2.88 ( -19%) 


4.24 (26%) 


fast_mu*2000 


1.09 


1.71 (57%) 


2.09 


1.02 (-105%) 


2.30 (10%) 


crypt*1000 


1.65 


2.50 (55%) 


3.51 


2.34 ^105%) 


4.05 (15%) 


meta_qsort*500 


1.36 


1.79 (31%) 


2.84 


0.92 ( -50%) 


3.08 ( 8%) 


prover*2000 


1.10 


1.54 (40%) 


2.03 


0.79 (-157%) 


2.30 (13%) 


browse*20 


6.04 


10.7 (77%) 


15.0 


7.15 ^109%) 


18.1 (21%) 


unify* 1000 


1.16 


1.32 (14%) 


2.13 


0.67 ^217%) 


2.07 (-3%) 


flatten*10000 


3.86 


4.29 (11%) 


6.34 


3.52 ( -80%) 


6.45 ( 2%) 


sdda*5000 


1.17 


1.44 (23%) 


2.01 


0.98 (-105%) 


2.06 ( 3%) 


reducer* 100 


1.71 


2.42 (42%) 


3.35 


1.49 ^125%) 


3.71 (11%) 


boyer*5 


1.57 


1.91 (22%) 


2.84 


1.33 ^114%) 


3.38 (19%) 


simple_analyzer* 100 


1.03 


1.31 (27%) 


1.65 


0.85 ( -94%) 


1.76 ( 7%) 


nand* 100 


1.08 


1.76 (63%) 


1.95 


0.99 ( -97%) 


2.58 (32%) 


chat_parser*100 


8.57 


12.0 (40%) 


14.7 


7.51 ( -96%) 


15.7 ( 7%) 


query* 1000 


1.44 


2.41 (67%) 


2.94 


3.09 ( 5%) 


4.14 (41%) 


Average 




43.3% 




-135% 


12.7% 



Table 1. Application Performance 
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To obtain the full picture we use two platforms. A Linux Pentium-II 266MHz, 
64MB memory Dell Latitude laptop, and a Sun Ultra-I workstation 167MHz, 
64MB memory. Table 1 shows benchmark performance for both systems. The 
times are given in seconds. The number to the right of the * gives the number of 
times we ran the benchmark. Time variation in all cases was standard for Unix, 
up to a maximum of 3%. We always choose the best times from 7 runs. We 
compare with both the threaded-code based and native-code implementation 
in the SPARC machine, and only with the threaded-code version on the x86 
machine. We also give in parenthesis a comparison of SICStus Prolog versus 
YAP performance as given by by: 



SICStus Prolog time 
YAP time 



1 ) * 100 % 



Table 1 shows that performance of YAP is somewhat better than emulated 
SICStus Prolog for the SPARC platform, but YAP benefits from more intensive 
optimisation in the x86 platform. In the case of Solaris, YAP performs par- 
ticularly well in benchmarks that involve backtracking, whereas SICStus Prolog 
threaded performs better in benchmarks that perform heavy unification and also 
benefits from implementing shallow backtracking [-5] . SICStus Prolog native is in 
average between two and three times faster than YAP, and performs particularly 
well in deterministic functional-style benchmarks. Surprisingly, YAP catches up 
in the query benchmark. The reason is that this benchmark is basically a search 
in the database followed by some arithmetic. The search component heavily 
depends on the hash-table searching in a Index instruction. SICStus Prolog na- 
tive is five times faster for nreverse. This is a standard benchmark for Prolog, 
which is particularly simple and is usually the first target for any optimisation. 
These results with SICStus Prolog native are consistent with published results 
for the AQUARIUS [28] and Parma [27] systems, although both these system 
can improve performance by applying global analysis. 

Note that there is no SICStus Prolog native for x86. SICStus Prolog was 
originally developed in m68k and then in SPARC or MIPS platforms, which are 
best supported. One would have to redesign most of the native code system to 
take full advantage of the x86 architecture [12]. Still, it is possible to implement 
native code systems for the x86: the wamcc generates C-code to be compiled by 
GCC [7], and Bin-Prolog is a non-WAM based system that generates native code 
for unification [26] . We have experimented with both the wamcc and Bin-Prolog 
on the x86 platform and found that YAP still obtained better performance for 
these benchmarks. 

In general, both emulators are based on the same abstract machine, the 
WAM, and tend to perform comparably. This is quite clear on the SPARC ma- 
chine. We can further conclude that YAP’s emulator has good performance for 
a WAM-based emulator, and is usually between two to three times slower than 
a good Prolog native code system. YAP performs particularly well for x86 ma- 
chines. To obtain this performance, several optimisation techniques were applied. 
They are the main subject of this work. 
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3 Optimising YAP 



To optimise an emulator-based system, one can apply several different tech- 
niques: 

1. One can improve the emulation mechanism. These techniques are quite in- 
teresting because they will improve every instruction, not just a few. 

2. One can optimise access to underlying hardware. This includes writing code 
to optimise super-scalar execution, or taking best advantage of instructions. 
In general, such optimisations will impact most (but not all) instructions. 

3. One can improve the abstract machine, say, by compacting several instruc- 
tions into a single instruction, by generating special instructions for com- 
monly found cases. Other examples include incorporating external function- 
ality to the abstract machine. 

These techniques only apply to a subset of instructions and tend to increase 
emulator size. The previous discussion shows they should be applied with 
care. 

4. One can improve compilation, say through improving YAAM register allo- 
cation or by applying results from global analysis. Note that to take best 
advantage of global analysis one may have to design new instructions, re- 
sulting in a larger and more complex emulator. In this regard, the consensus 
is that native code systems are best suited to this purpose, as they make it 
easier to specialise instructions. 

Global analysis is not currently being used in commercial Prolog systems, 
although robust systems are available available [4,15] and work is being done 
to improve their performance for large-scale applications [13]. 



In this work we concentrate on the first three styles of optimisation, as they 
are the ones that relate with emulator design. We would like to remark that 
improvements in other compilation techniques, namely differences in abstract 
machine register allocation between compilers, seem not be a major factor in 
performance, as for example SICStus Prolog implements a much more sophisti- 
cated abstract machine register allocator [6] than the simple one used in YAP. 

Throughout this analysis we present the impact of each optimisation as the 
ratio between a fully optimised system with and without the optimisation. This 
form of analysis is more interesting to understand whether a specific optimisa- 
tion is valuable in the final system. One should remember that most optimisa- 
tions have a cost, if only in the effort one must put into the system. Whenever 
meaningful, we present the performance impact for both the x86 and SPARC 
implementations. Throughout, we always present speedups by the formula: 



( 



Unoptimised time 
Optimised time 



1) * 100% 



3.1 Improving the Emulation Mechanism 

There are several techniques that can reduce the overheads for the basic em- 
ulation mechanism. The main techniques applied in YAP are threaded code 
emulation, abstract machine register access, and prefetching. 
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Threaded Code Emulation This is a very well-known optimisation. Fig. 1 shows 
the non-threaded and threaded code implementation of a WAM instruction 
get_x_value A3,X4. Note that differently from most abstract machines the 
WAM is register-based, not stack-based. This specific instruction unifies the 
contents of YAAM argument register 3 with the contents of argument register 
4 (in the WAM A and X are aliases to the set of argument registers). Threaded 
code emulation only affects the opcode field. 



Op: 15 




Op: 0x23ac 


Xi: 3 




Xi: 3 


Xj:4 




Xj:4 



(a) No threaded code (b) Threaded code 

Fig. 1. Threaded Code 



Without threaded-code emulation, the opcode field contains the number for 
the instruction. The number is traditionally a label associated with a switch 
statement. To execute an instruction one jumps to the beginning of the switch, 
executes the switch code (usually a direct array access), and jumps to the switch 
label. With threaded code, the instruction already contains the switch label. 
Executing an instruction is thus a question of fetching the opcode field and just 
jumping to it. 

Unfortunately, threaded-code is not directly supported in standard C. One 
of the reasons for using GCC in emulator-based systems is that GCC allows labels 
as first order objects, thus greatly simplifying threaded code emulation. Table 2 
shows the impact of this technique in YAP. The technique is quite effective, 
both in the x86 and SPARC architectures, and speedups are always more than 
20%. The best results in SPARC were obtained for queens, unify, query and 
boyer, all with many simple put and write instructions. The x86 machine has 
a rather different and much wider variation. The problem here, as we shall see 
next, is that this optimisation also enables other important optimisations in the 
x86 machine. 

Abstraet Maehine Register Access Fig. 2 shows the same principle at work for 
the abstract machine register access optimisation. Instead of adding the register 
offsets to a variable X, we can have abstract machine registers at fixed memory 
positions and store the address for each X [i] . Usually, each instruction will have 
at least one register access, and quite often two. In the SPARC architecture. 
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Table 2. Emulation Mechanism Speedups 



we guarantee the emulator will have the base address for the abstract machine 
register in a physical register anyway, so we are saving an add and possibly 
an extra register. In an x86 machine, the optimisation allows using indirection, 
instead of having to use indexed addressing. The cost is that one needs a larger 
YAAM instruction: whereas we could store the register number in a 16-bit word, 
we now need to store a full address. 

In both cases, the improvements are minor, as shown in Table 2. The simpler 
addressing mode results in constant improvements for the x86 implementation. 
Performance in the SPARC machine varies between an 8% speedup and a 17% 
slowdown, probably due to the larger instruction size. The effect for boyer is 
quite interesting. We have repeated this effect and believe it to be a cache effect, 
as this effect was not repeatable in an Ultra-II machine. 

Hard-coding YAAM registers does have a second drawback that is not im- 
mediately obvious. Imagine one wants to implement a multi-threaded system. 
One would like to have several engines working concurrently. With this optimi- 
sation, the engines will have to share the same YAAM register set, meaning that 
context-switches between threads will be more expensive, as we need to save the 
X registers. Moreover, context-switching may only be performed at well defined 
points where we know how many YAAM registers are in use. 
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(a) No register optimisation (b) Register optimisation 

Fig. 2. Improving Register Access 



Prefetching Using threaded code still generates a stall. One needs to increment 
the program counter, fetch the new opcode, and jump. The CPU will stall be- 
fore jumping, as it needs to wait for the opcode. To avoid this problem, one 
can prefetch the new opcode, and guarantee that the CPU will know the jump 
address in advance. 

Note that the technique requires for the new temporary variable to hold the 
prefetched value. This variable must be allocated in an extra machine register, 
otherwise execution may actually slowdown. The x86 lacks in registers, so we 
only applied the technique for instructions where we know the compiler will have 
sufficient registers. The SPARC architecture should have sufficient registers, and 
we apply the technique everywhere. Table 2 shows the results. In both cases they 
are quite disappointing (in fact the technique was originally designed for single- 
issue pipeline-based implementations, such as the 80486, where it did perform 
better). We believe the problem is that the CPUs are super-scalar and hence 
quite effective at prefetching arguments themselves. This removes some of the 
advantages from using software prefetching. Software prefetching further requires 
an extra register and thus adds extra pressure to the compiler. 

We can apply a similar technique to argument access. Imagine the following 
sequence of instructions: 

get_x_var table XI, A2 
get_list X2 

The observation is that in the WAM after instructions such as get_x_variable 
or get_x_value one most often will have a Get instruction (corresponding to 
the next argument). Moreover, the first argument to this instruction will ac- 
cess a YAAM register. The current instruction can therefore prefetch the next 
argument in the previous instruction, thus making it available to the next one. 
Unfortunately, this optimisation requires specialised instructions because in gen- 
eral we do not know whether the previous instruction did the prefetching or not. 

The first argument is quite important in the WAM, as in entering a proce- 
dure the first instruction is likely to either be an indexing instruction or a try 
instruction. In both cases, that instruction will require the first argument. To 
optimise this case, instructions that call procedures must fetch the first argu- 
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merit in advance. We have implemented the optimisation for RISC architectures, 
where the prefetched YAAM register can be placed on an unused abstract ma- 
chine register. The optimisation is not supported for x86 registers, because the 
instruction set does not have sufficient registers. 



3.2 Taking Advantage of the Hardware 

Ideally, given a fragment of code, the C-compiler should be able to take best 
advantage of the available hardware. In practice, C-compilers have several lim- 
itations and require a bit of help in order to achieve good performance. One 
major consideration concerns register allocation. Good register allocation is fun- 
damental to obtain best performance, but is not always obtained by just giving 
the code to the compiler. A second consideration concerns scheduling within the 
CPU. We want to minimise stalls on memory and on branches. Ideally, the com- 
piler should be able to reorder instructions for best performance. Unfortunately, 
C-compilers must worry about pointer aliasing. This is specifically a problem 
with emulators for high-level languages, as these languages heavily manipulate 
pointers. 

To obtain maximum performance, YAP’s emulator was written assuming a 
load-store mechanism. We also tried to maximise possible concurrency between 
instructions in the code. Unfortunately, it is very hard to measure the impact of 
these optimisations, as they are really embedded in the fabric of the emulator. 
In the next paragraphs we concentrate on two optimisations that are easier to 
measure: register allocation, and tag schemes. 

Improving Register Allocation Machine register allocation is a totally different 
problem in the x86 and in the RISC architectures. This said, there is a simple 
rule of thumb we found of use in both cases: reduce the scope of temporary 
variables to the very minimum. Often one reuses a temporary variable, say i, 
with different purposes. For a very complex function as it is the emulator, this 
complicates register allocation, which is being stressed to the limit. 

In the case of RISC architectures, the problem is how many abstract machine 
registers we are allowed to fit in the machine and still have the compiler doing 
decent allocation. In the specific case of the SPARC architecture, we found we 
could declare 7 registers to hold copies of WAM registers within the emulator. 
Moreover, in GCC we can declare 3 registers to be global variables storing YAAM 
registers. Adding extra registers will decrease code quality. 

In the case of x86 architecture, we have 7 registers available, counting the 
frame pointer. We have used three techniques to improve performance: 

1. all the abstract machine registers are stored in the emulator’s activation 
frame. This means we can access them through the stack pointer. We can 
also guarantee they are very close to the top of the stack and use the corre- 
sponding x86 optimised instructions. This is not as effective for superscalar 
implementations as for older pipelined implementations. 
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2. We can explicitly copy an YAAM register to a temporary variable, if the 
instruction heavily depends on that register. This mechanism is particularly 
useful on the x86 architecture. Note that for RISC machines we can store 
quite a few YAAM registers as machine registers, so we must be careful lest 
this optimisation will confuse the compiler. 

We have found out experimentally that we can do always cache one abstract 
machine register, the global stack pointer H. The explanation is that, in order 
not to lose efficiency in SPARC machines we need that in RISC machines the 
compiler will alias the copy of H back to the original H. This works well for H 
because uses of H almost never cross uses of other pointers. For other three 
important abstract machine registers, the current environment pointer, ENV, 
the trail pointer, TR, and the latest choice-point pointer, B, we need to have 
conditional code for the x86, otherwise RISC code would suffer. 

3. We can force YAAM registers to be x86 registers. In general, this is a bad 
idea because it dramatically decreases code quality. There is an important 
exception, though: the YAAM program counter is so important in program 
execution that there are large benefits in storing it as a x86 register. 

It is very hard to study the impact of all these optimisations, as they are 
interspersed in the code. In the case of the x86, we next give performance data on 
x86-specific copying, and on using the abstract machine PC as an x86 register. 
In the case of the SPARC, we give performance data on copying the YAAM 
registers. 

Table 3 shows the results. The x86 architecture is register starved and any- 
thing we can do to optimise register allocation is clearly welcome. Note that in 
theory, the C-compiler should be able to reuse copies of the abstract machine 
registers itself, and this optimisation should be unnecessary. The second optimi- 
sation again shows the limitations of the compiler. The YAAM program counter 
is by far the most often referred variable in the abstract machine. Storing it in 
a register is always highly beneficial, even if it worsens register allocation. We 
believe this optimisation to be the main improvement in YAP over the SICStus 
Prolog implementation. 

The third column shows the advantage of copying the abstract-machine reg- 
isters to registers for the SPARC machine (the exception is the YAAM PC that 
will always be in a register) . Ideally, the compiler should do this copy itself. Sur- 
prisingly, this technique has a greater impact than actually using threaded-code. 

4 Improving the Abstract Machine 

The previous optimisations are designed to take best advantage of the instruction 
set and compiler. Their implementation decrease the YAAM’s CPI (cycles per 
instruction). One can also consider optimisations that improve execution by 
changing the abstract machine specification itself. We found two optimisations 
to be most important: instruction merging and abstract machine extension. 

For completeness sake, we would like to refer that YAP includes further, 
very WAM-specific optimisations. The major optimisation is on how to handle 
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Table 3. Register Allocation Speedups 



reading or writing sub-arguments to compound terms. In the original WAM this 
is supported by an extra register, the RWREG, that is consulted at every unification 
instruction. In the YAAM this is implemented by having two opcode fields in each 
unify instruction. Instructions in write mode thus can use a different opcode from 
instructions in read mode. There are also possible optimisations not implemented 
in YAP. For instance, SICStus Prolog has specialised instructions for the WAM’s 
A1 register (the first argument in a call). The motivation is that this is the most 
commonly used, and if processed separately may be stored in a machine register. 
We do not implement this optimisation because it would significantly increase 
compilation times, and because the extra instructions would make the system 
harder to maintain. 

Instruction Merging The idea of instruction merging is straightforward: to reduce 
the overheads of emulation by joining several instructions into a single instruc- 
tion. The idea has shown to be quite effective [19]. Unfortunately, combining 
all pairs of instructions would square emulator size, whereas many combinations 
would never appear in practice. We therefore need to consider which combina- 
tions are more frequent. This clearly depends on our experience, and specifically 
on the programs we want to optimise. We next discuss the main optimisations as 
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Table 4. Abstract Machine Speedups 



applied to YAP, note that this discussion is dependent on understanding WAM 
execution: 

— It is common to have several void sub-arguments in a compound term. These 
sub-arguments can be joined into a single instruction with little overhead. 

— A related common case is where we have two sub-arguments that are vari- 
ables. There are four cases depending on whether it is a first access or not 
to each variable sub-argument. 

— It is quite common in Prolog to access a list whose head is a variable. This 
means that get_list and unify_variable instructions can be merged. 

~ The last case optimises the recursive clause for the member/2 predicate: we 
access a list where the first argument is void and the second is the first 
occurrence of a variable. 

Performance analysis is given in the Merge columns for table 4. Notice that 
there are a negative results, which we believe, result from variations in tim- 
ings. The results show the optimisation to be reasonably effective in the few 
cases where it applies, but only 6 out of 22 benchmarks have significant ben- 
efits. Moreover, in the larger programs only nand benefits. In general, Prolog 
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programs tend to use a reasonably large number of instructions. Abstract in- 
struction merging would need to be performed extensively or will hardly result 
in major speedups for most benchmarks. 

Extending the Abstract Machine Most real Prolog applications depend on exter- 
nal functionality, such as: 

— meta-predicates, such as var/ 1 and friends; 

— arithmetic builtins, such as arithmetic with the is/2 built-in; 

— explicit unification, through the =/2 built-in; 

— term manipulation or comparison, for instance, argument access with arg/3. 

Yap implements most of the meta-predicates, integer arithmetic (although not 
comparison), unification, and the major term manipulations built-ins, functor/3 
and arg/3 directly as abstract machine operations. Table 4 shows the impact in 
performance as compared to a version that does not improve arithmetic and uses 
the same code for meta-predicates, explicit unification and term manipulation, 
except that this code is now implemented outside the YAAM. Note that the 
N/A result is a bug in the non-optimised system. 

Table 4 shows this optimisation to be effective for 11 out of 22 benchmarks. 
Also note that quite a few of the larger benchmarks benefit from this opti- 
misation. In general, real Prolog programs depend heavily on features such as 
arithmetic or meta-predicates that are not available in the original WAM. This 
is particularly true for large programs. The best results were obtained for ap- 
plications that require arithmetic. The impressive improvement in performance 
reflects effort in improving both compilation of arithmetic and its implemen- 
tation as abstract machine instructions. Note that arithmetic comparisons are 
still performed outside the abstract machine, so performance could be further 
improved. 

5 Conclusions and Future Work 

We have discussed a set of optimisations for a Prolog emulator. Most of these 
techniques apply to any emulator. We have shown that substantial performance 
improvements can be obtained from improved register allocation, threaded code, 
and abstract machine extensions. We have found instruction merging not to be 
widely effective, and software prefetching to be of limited impact for modern 
superscalar machines. 

One interesting advantage of the emulators is that they provide a perfect 
environment for experimenting with optimisation techniques. Ideally, the impact 
of an optimisation will be replicated in the many times the instruction will be 
executed. This makes it quite possible to do considerable hand-tuning, and in 
the best cases approach the performance of native-code systems. The major 
disadvantage is introduced by the granularity of abstract machine instructions. 
We have seen that it is not straightforward to optimise across instructions, and 
that instruction merging quickly increases instruction size for little benefit. In 
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this regard, it will be interesting to see how new developments in VLIW style 
CPUs will impact the emulated versus native code ratios. 

In general, the more complex the basic operations for the language are, the 
better will an emulator perform. In this regard, Prolog holds an intermediate po- 
sition between languages such as Java and ML, on the one hand, and constraint 
or concurrent languages. Our general conclusion is that, at least for Prolog, em- 
ulation is still a valid technology for the implementation of high-level languages. 
Performance is acceptable and the general system is simpler and easier to adapt. 

Further optimisations are possible. We mentioned using prefetching instruc- 
tions from modern IS As. We have also experimented with inline assembly for 
frequent WAM operations such as trailing, but we found out this makes life too 
hard for the compiler. We would also like to obtain a mathematical descrip- 
tion of the relationship between each optimisation and benchmark performance. 
We have already worked in classifying instructions and deriving instruction fre- 
quency. Further work requires timing the execution of individual instructions. 
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Abstract. In this paper, we introduce a programming language for an 
abductive reasoner. We propose the syntax for an imperative language 
in the usual manner and its semantics as a mapping from the language 
statements to an abductive logic program. The design is such that any 
semantics for abductive logic programs could be taken as the basic se- 
mantics for the programming language that we propose. In this way, we 
build upon existing formalizations of abductive reasoning and abductive 
logic programming. One innovative aspect of this work is that the agent 
processing and executing OPENLOG programs will stay open to the en- 
vironment and will allow for changes in its environment and assimilation 
of new information generated by these changes. 



1 Introduction 

Abduction is a non-valid form of reasoning in which one infers the premises of 
a rule given the consequent. This form of reasoning is not valid in classical first 
order logic since, for instance, one is not allow us to deduce the atom b from 
the clause h b and the atom h. However, in general, in the presence of h 
and this clause, our intuition allows us to say that b could well be the case. 
That is, we are allowed to offer b as an explanation or a hypothesis for h in the 
context of that clause when we do not have more information. This is abduction. 
An abductive reasoner uses abduction as one of its inference rules. Abduction 
enables reasoning in the absence of full information about a particular problem 
or domain of knowledge. 

In this paper, we introduce a programming language for an abductive rea- 
soner. We propose the syntax for an imperative language in the usual manner 
(summarized in table 1) and its semantics is defined as a mapping from the lan- 
guage statements to an abductive logic program (shown in table 2). The design 
is such that any semantics for abductive logic programs could be taken as the 
basic semantics for the programming language that we propose. In this way, we 
build upon existing formalizations of abductive reasoning and abductive logic 
programming. 

G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 278-293, 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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A substantial effort has been made to formalize abductive reasoning. Poole’s 
Theorist [27] was the first to incorporate the use of abduction for non-monotonic 
reasoning. Eshghi and Kowalski [10] have exploited the similarities between ab- 
duction and negation as failure and provided a proof procedure based on a trans- 
formation of logic programs with negation into logic programs with abducible 
atoms, de Kleer incorporates abduction into the so-called truth maintenance 
systems to obtain the ATMS [7]. Also, in [3], L. Console, D. Theiseider and P. 
Torasso analyse the relationships between abduction and deduction and define 
what they call an abduction problem as a pair <T,<p > where: 

1. T (the domain theory) is a hierarchical logic program^ whose abducible 
atoms are the ones not occurring in the head of any clause. 

2. (j) (the observations to be explained) is a consistent conjunction of literals 
with no occurrence of abducible atoms. 

A solution to the abduction problem is a set of abducible atoms that, together 
with T , can be used to explain cj). 

The purpose of imposing structures such as < T, <() > upon a reasoning 
problem is to create frameworks in which the semantics of each component 
and its relationships with other components can be established in a declarative 
manner. A framework is a structure that distinguishes between types of elements 
in a formalization. For instance, the framework < T, </>, Ab > could be used to say 
that one has a theory T, a set of observations (j) and that these observations can 
be explained by abducing predicates in T whose names appear in Ab (abducible 
predicates). These distinctions are then used to justify differential treatment of 
each type of component. In the cases considered here, for instance, abducible 
predicates and non-abducible predicates, so separated by the framework, are 
processed differently. The distinction captures the fact that the former, unlike 
the latter, denote uncertain or incomplete information. 

The use of frameworks has been taken further by Kakas and Mancarella [17], 
Denecker and De Schreye [9], Toni [33], Fung [14] and more recently, Wetzel et al 
[36] , [35] in the context of incorporating abduction into constraint logic program- 
ming. In [16] there is an overview of the first efforts to incorporate abduction 
into logic programs. In [13] there is a preliminary description of the abductive 
framework that we have used (in [6]) to formalize the reasoning mechanism of 
an agent. In this work, the agent is as an abductive reasoner that uses abduction 
to plan its actions to achieve its goals. 

2 An Abductive Proof Procedure 

In [13], Fung and Kowalski introduce an abductive proof procedure aimed at 
supporting abductive reasoning on predicate logic and, in particular, on abduc- 
tive logic programs. The iff proof procedure, as they call it, (iffPP hereafter), is 
an aggregate of the following inference rules: unfolding, propagation, splitting, 

^ A hierarchical logic program is a logic program without recursive rules. 
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case analysis, factoring, logical simplifications and a set of rewriting rules to 
deal with equalities plus the abductive rule described above. Fung and Kowalski 
also produce soundness and completeness results in [13]. We describe an im- 
plementation of ifFPP in [6] together with some examples of how it could be 
used. 

A proof procedure can be seen as specifying an abstract machine that trans- 
forms formulae in other formulae. It could even be seen as “an implementation 
independent interpreter for” the language of those formulae [34]. That is, a proof 
procedure determines an operational semantics for logic programs (.ibid). Thus, 
ifFPP specifies an operational semantics for abductive logic programs. By relating 
this operational semantics to a programming language, one can get to program 
the abductive reasoner for particular applications, such as hypothetical reason- 
ing and problem solving (e.g. planning) in specific knowledge domains and for 
pre-determined tasks. 

In this paper, we go a step further in the definition of a programming language 
for the abductive reasoner defined by ifFPP. Instead of simply relying on the 
procedural interpretation of (abductive) logic programs, we introduce a more 
conventional imperative language and explain how this can be mapped onto 
abductive logic programs of a special sort. These abductive logic programs lend 
themselves to a form of default reasoning that extends the traditional use of 
programming languages, i.e., the new definition supposes a re-statement of what 
a program is. 

In the context of this research, a program is seen as a scheme that an agent 
uses to generate plans to achieve some specified goal. These plans ought to lead 
that agent to display an effective, goal-oriented behaviour that, nevertheless, 
caters for changes in the environment due to other independent processes and 
agencies. This means that, although the agent would be following a well-defined 
program, it would stay open to the environment and allow for changes in its cir- 
cumstances and the assimilation of new information generated by these changes. 

So defined, a program is not a closed and strict set of instructions but a 
list of assertions that can be combined with assertions from other programs. 
One advantage of this definition is that the code being executed remains open 
to updates required by changes in the circumstances of execution. The other 
important advantage is that it allows the executor of the program to perform a 
form of default reasoning. By assuming certain set of circumstances, the agent 
will execute certain sequence of actions. If the circumstances change, perhaps 
another sequence will be offered for execution. 

The paper is organized as follows: The next section shows an example to illus- 
trate the principles of abductive programming. The following section introduces 
the syntax and semantics of a new logic programming language for abductive 
programming: OPENLOG. Then, the semantics of OPENLOG and its relation- 
ship with background theories based on the Event Galculus [20] is explained. A 
discussion of the characteristics and advantages of OPENLOG is also presented 
before concluding with some remarks about future research. 
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3 Toy Examples of Abductive Programming with 
OPENLOG 





a) b) 



Fig. 1. Two Blocks- World scenarios for planning 



In this section we illustrate with examples the relationship between abduc- 
tion and planning based on pre-programmed routines. Consider the scenario in 
figure 1: 

Example 1. An agent is presented with the challenge of climbing a mountain of 
blocks to get to the top. The agent can climb one block at a time provided, of 
course, that the block is there and at the same level (i.e. just in front). The 
planning problem is then to decide which blocks to climb onto and in which 
order. An OPENLOG procedure to guide this planning could be: 

proc climb 
begin 

if infront ( Block ) and currentlevelC Level ) 
and Block is_higher_than Level then 
begin 

step_on( Block ) ; climb 
end 

end 



Given the scenario in figure 1 a) and the OPENLOG code above, an abductive 
agent might generate the alternative plans: 

do{self,step.on{a),ti) A < ^2 A do(se//, step_on(c), O) and 
do{self, stepjjn{b),ti) A < ^2 A do{self,stepjm{c),t2), 
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where do{Agent, A, T) can be read as “Agent does action A at time T” . This 
can be done by relating every OPENLOG program to an abductive logic pro- 
gram that refers to the predicates do and < , and declaring those predicates as 
abducibles. This mapping is provided by the definition of the predicate done as 
shown in section 6 and in section 7. 

Still in this scenario, it might be the case that the agent interpreting this 
code learns that at some time tp. 

t\ < ti < t 2 A do{somebodyelse,remove{c),ti)-> 
i.e. an event happens that terminates the block c being where it is. The agent 
ought then to predict that its action stepjon{c) will fail. This could be done, 
for instance, if the agent represented and, of course, processed an integrity con- 
straint such as: {do{Ag, Ac^T) preconds{Ac,T)), where preconds verifies the 
preconditions of each action. 

This type of reasoning that combines abduction with integrity constraints is 
the main feature of iflfPP. The agent using ifFPP as is reasoning procedure may 
predict that an action of type Action will fail and then either dismiss the corre- 
sponding plan (i.e. no longer consider it for execution) or repair the plan by ab- 
ducing the (repairing) actions required to make preconds{Action, T) hold at the 
right time. Transforming ifFPP in a planner that allows replanning must be done 
with care, however, because, as we argue below, it may lead to “over-generation” 
of abducibles, i.e. to produce too many “repairing” alternatives (some of them 
with there own problems due to ramifications) . 

What we have done to tackle the original problem (transforming ifFPP in the 
planner for an open agent) is to combine OPENLOG (with its solution for over- 
generation of abduction) with another programming language, this one based on 
integrity constraint, which we call AGTILOG [6]. We focus this paper on OPEN- 
LOG, due to space constraints and because integrity constraints equivalent to 
the one above (that involves the predicated preconds) can also be produced from 
OPENLOG code. 



3.1 Over-Generation of Abducibles 

As we said, one has to be careful with the generation of abducible predicates. 
Notice, for instance, that in figure 1 b) the only feasible plan is: 

do{self,step.on{b)Ai) A G < O A do{self,step-on{c),t2)7 
because the block a is not there. The agent may know about actions that cause 
infront(a) to be the case (such as, say, putJ)lockJn-front{a)). It could there- 
fore schedule one of these actions to repair the plan. In (the more usual) case 
where the agent cannot actually perform the action, the only way to prevent the 
scheduling of the repairing action is to perform some (non-trivial) extra compu- 
tation to establish, for instance, that the agent will not be able to “move the 
block a” in the current circumstances. 

This type of behaviour is what one would get from a general purpose, ab- 
ductive reasoner like ifFPP. It will “generate” all the possible combinations of 
abducibles actions to satisfy its goals. And these may be too many, irrelevant 
or impossible. Observe that this general reasoner will generate the same sets of 
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stepjon actions for both situations a) and b) in figure 1. Moreover, it will add 
actions to repair the plans (all of them) in all the possible ways (e.g. moving 
blocks so that infront holds for all of them). The problem becomes even more 
complex if one considers other physical or spatial effects the agent should be 
taking care of (like how many blocks should be regarded as being in front of the 
agent) . 

We want to save the extra-computation forced on the agent by these repairing 
actions and other effects. We want to use the structures in the program (climb in 
this case) to decide when the agent should be testing the environment and when 
it should be abducing actions to achieve its goals. This is a form of interleaving 
testing and planning. 

One of the advantages of our approach is that, as part of defining the map- 
ping procedural code —>■ abductive logic programs, we can inhibit that “over- 
generation” of abducible predicates. The strategy for this is simple: an expres- 
sion C appearing in if C then ... will not lead to the abduction of atoms. Any 
other statement in a program will. We have modified ifFPP (and therefore the 
related operational semantics of abductive logic programs) to support a differ- 
ential treatment of certain predicate definitions. When unfolding the expression 
C, in if C then ..., the involved predicates are not allowed to contribute with 
more abducibles, but simply to test those previously collected in order to satisfy 
the definitions. Thus, the expression if C then ... in OPENLOG is more than 
a mere shorthand to a set of clauses in an abductive logic program. It is a way 
for the OPENLOG programmer to state which part of the code must carry out 
tests (on the agent’s knowledge) and which must lead to actions by the agent. 
This strategy adds expressiveness to the programming language and makes of 
abduction a practical approach for the planning module of an agent [6] . 

With the inhibited platform and the code in example 1 above we state that, 
at that stage, the agent is just interested in testing whether infront{A) actually 
holds for some block A. If the programmer decides that the agent must also build 
the mountain to be climbed, then she will have to write for the “climber-builder” 
agent a program such as this: 

Example 2. proc climb 
begin 

if infront ( Block ) and currentlevelC Level ) 
and Block is_higher_than Level then 
begin 

step_on( Block ) ; climb 
end 

else 

if available! Block ) and not infront ( Block ) then 
put_block_in_f ront ( Block ) ; climb 

end 

In this second program, when the agent has no block in front (so that the 
first test fails) and there is some block available in the neighbourhood, then 
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the agent will indeed schedule (abduce) the action putJ>lockJn-front{A) for 
execution (provided that action is a primitive action) . 

Thus, with inhibited abduction the agent is interleaving the “testing” of 
properties with the “planning” of actions. This testing is program-driven, i.e. the 
programs and the goals establish when the system will be testing and when it will 
be planning (abducing). Moreover, notice that the “testing” is not restricted to 
the current state of the world. Earlier actions in a plan can be used to establish 
that some property holds at a certain time-point. For instance, the climbing 
agent above may be able to deduce that after do{step-on{a),ti), infront{c) will 
hold. 



4 OPENLOG: Prom Structured to Logic Programming 

In the following, a well-known programming language (STANDARD PASCAL) 
is used as the basis to create a language that supports the kind of open problem- 
solving and planning behaviour mentioned above. The semantics of the resulting 
language (OPENLOG) is based on a logic of actions and events that caters for 
input assimilation and reactivity. In combination with the reactive architecture 
described in [6], where the interleaving of planning and execution is clearly de- 
fined, this language can provide a solution to the problem of agent specification 
and programming. 

OPENLOG is aimed at the same applications as the language GOLOG of 
Levesque et al [21] i.e. agent programming. Our approach differs from Levesque 
et al’s in that there is no commitment to a particular logical formalism. One can 
employ the Situation Calculus or the Event Calculus depending on the require- 
ments of one’s architecture. However, the Event Calculus has turned out to be 
more expressive and useful for the reactive architecture described in [6]. 

Like GOLOG, our approach also regards standard programming constructs 
as macros. However, here they are treated as special predicates or terms^. There 
is no problem with recursive or global procedures. Procedures are like predi- 
cates that can be referred to (globally and recursively or non-recursively) from 
within other procedures. Interpreting these macros is, in a sense, like translating 
traditional structured programs into normal logic programs. 

The following section 5 describes the syntax of the language which is, ba- 
sically, a subset of PASCAL extended with operators for parallel execution. 
Section 6 explains the semantics of OPENLOG by means of a logic program 
(defining the predicate done). In section 7, we introduce the background theo- 
ries: the temporal reasoning platform on which OPENLOG semantics in based. 
In [6], we illustrate the use of OPENLOG and the background theories with a 
more elaborated example: The Elevator Controller. 

^ See [DNOl] in table 2 below: proc can be regarded as a two-argument predicate, 
the following symbol is a term, and begin and end are bracketing a more complex 
term. 
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5 The Syntax of OPENLOG 

The syntax of OPENLOG is described in BNF form^ in table 1. 

The syntax is left “open” to accommodate, in suitable syntactic categories, 
those symbols designated by the programmer to represent fluents, primitive ac- 
tions and complex actions. In addition to the syntactic rules, the system must 
also provide translations between the “surface syntax”, that the programmer 
will use to write each Query, and the underlining logical notation. 

In this initial formalization, PASCAL syntax is limited to the least num- 
ber of structures required for structured programming: ( , “if., then., else.. ”, 

“while”). On the other hand, the syntax supports the representation of parallel 
actions through the compositional operators par ^ and -1- 



6 The Semantics of OPENLOG 

The semantics of the language is stated in table® 2 by means of the predicate 
done"^ . The definition of done can also function as an interpreter for the language. 
Declaratively, done{A,To,Tf) reads “an action of type A is started at To and 
completed at T/”. As the definition of done is a logic program, any semantics of 
normal logic programming can be used to give meaning to OPENLOG programs. 

One of the innovations in OPENLOG is that between any two actions in a 
sequence it is always possible to “insert” a third event without disrupting the 
semantics of the programming language. Axiom [DN02] formalizes this possibil- 
ity. This is what we mean by plans (derived from OPENLOG programs) as being 
open to updates. 

The definition of semantics in table 2 needs to be completed with a “base 
case” clause for the predicate done and the definition of holds. These two ele- 
ments are part of the semantics, but they are also the key elements of a back- 
ground theory B. 

^ In the table, Sj means an instance of S of sub- type j. (A)* indicates zero or more 
occurrences of category A within the brackets. 

Unlike those semantics of interleaving ([15], [24]) this is a form of real parallelism. 
Actions start simultaneously, although they may finish at different times. Notice that 
when all the actions have the same duration (or when they all are “instantaneous” ) 
this operator is equivalent to -|-. Also, observe that the agent architecture described 
in [18] only handle actions which last for one unit of time. We relax this limitation 
in [6]. 

® used as well to express real parallelism. Actions start and finish at the same time. 
This allows the programmer to represent actions that interact with each other so 
that the finishing time of one constraints the finishing time of the other. For instance, 
taking a bowl full of soup with both hands and avoiding spilling [32]. 

® PROLOG-like syntax is being used. 

^ The definitions of other predicates are also required but are not problematic. 
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Table 1 OPENLOG: Syntax 


Program 


Proc ( Program )* 


A program 


Proc 


;:= proc FunCproc 






begin Commands end 


Procedure definition 


Block 


::= begin Commands end 


Block 


Commands 


;:= Block 


Block call 




1 F UTlCproc 


Procedure call 




1 F UTlCaction 


Primitive action call 




1 Commands ; Commands 


Sequential composition 




1 Commands par Commands 


Parallel composition 




1 Commands + Commands 


Strict parallel composition 




1 if Exprhooiean then Commands 

1 if Exprhooiean then Commands 


Test 




else Commands 


Choice 




1 while Exprbooiean do Block 


Iteration 


Query 




Logical expressions 


Exprj 


;:= Euncj (Func, Func, . . . , Func) 


Expressions 


Func 


11 = FuTlCproc 
1 F UTlCaction 

1 FuTIC fluent 






1 FuTlCboolean 


Functors 


F UTlCproc 


::= serve( Term ), build( Term ), ■ ■ ■ 


User-defined names 


F UTlCaction 


;:= nil 


Null action 




1 up 1 move(Term, Term) \ ... 


User-defined primitive ac- 
tions’ names 


F IITIC fluent 


at(Term) on( Term, Funcfiuent )\ ■■■ 


User-defined fluents 


F UTiCfjoolean 


— Q/nd(^ FuTlCfluent ; FuTlCboolean ) 
1 OV^ FuTlCfluentj FuTlCboolean ) 

not( FunCboolean ) 






1 F UTIC fluent 


Boolean functions 




Query 


Tests on “rigid” informa- 
tion 


Term 


;:= Ind \ Var 


Terms can be individuals or 
variables 


Ind 




Individuals identified by the 
user 


Var 


::=... 


Sorted Variables 



Table 1. The Syntax of OPENLOG. 
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1 Table 2 OPENLOG : Semanticsandinterpreter 




done{Pr, To,Tf) 


^ proc Pr begin C end 






A done{C,To,Tf) 


[DNOl] 


doneii Cl ; C2),T,,Tj) 


« — done(Ci,To,Ti) A Ti < T 2 




done{{ Cl par C 2 ), 


A done{C 2 ,T 2 ,Tf) 


[DN02] 


To,Tf) 


<— done{Ci,To,Ti) A done{C2,To,Tf) 
A Ti < Tf 

\/ done{Ci,To,Tf) A done{C2,To,Ti) 






A Ti < Tf 


[DN03] 


done{{ Cl + C2),To,Tf) 
done{{ii E then Ci), 


^ done{Ci,To,Tf) A done{C2,To,Tf) 


[DN04] 


To,Tf) 


^ holdsAt{E,To) A done{Ci,To,Tf) 




done((it E then Ci 


V —<holdsAt[E,To) A To = Tf 


[DN05] 


else C2),To,Tf) 


^ holdsAt{E,To) A done{Ci,To,Tf) 




done((while 

3L (Efc(L) 
do B(L))), 


\/ ^holdsAt{E,To) A done{C2,To,Tf) 


[DN06] 


To,Tf) 


^ holds At{Eb{L), To) 

ATo = Tf) 

V {holds At{Eb{L'), To) 

A done{B{L'),To,Ti) 

A To <Ti 
A done((while 




done((begin C end), 


3L {Eb{L) do B{L)) ),Ti,Tf)) 


[DN07] 


To,Tf) 


<— done{C,To,Tf) 


[DN08] 


done(nil, To, To) 




[DN09] 


holds At{a.nd(X,Y),T) 


^ holds At{X, T) A holdsAtiy, T) 


[DNIO] 


holds At{or{X,Y),T) 


^ holds At{X, T) V holdsAtiy, T) 


[DNll] 


holds At{not{X) , T) 


<— —•holds At{X,T) 


[DN12] 


holds At{X, T) 


^ nonrigid{X) A holds{X,T) 


[DN13] 


holds At{Q, T) 


<— rigid{Q) A Q 


[DN14] 


nonrigid{X) 


<— isfluent{X) 


[DN15] 


rigid{X) 


^ -^isfluentiX) 


[DN16] 



Table 2. The Semantics of OPENLOG 
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7 Background Theories 

Roughly, a background theory (B) is a formal description of actions and properties 
and the relationships between action- types and property- types. 

A background theory consists of two sub-theories: A set of domain indepen- 
dent axioms (DlyS) (notably the base case of done and the definition of holds) 
stating how actions and properties interact. These domain independent axioms 
also describe how persistence of properties is cared for in the formalism. 

The other component of the background theory is a set of domain depen- 
dent axioms (DDS), describing the particular properties, actions and inter- 
relationships that characterize a domain of application (including the definitions 
of initiates, terminates and is fluent ). 

The semantics for OPENLOG can be isolated from the decision about what 
formalism to use to represent actions and to solve the frame problem (the prob- 
lem of persistence of properties) in the background theory. Formulations based 
on the Event Calculus [20] and on the Situation Calculus [22]® are equally well 
possible. The following one is based on the Event Calculus. 

Probably, the most important element in a background theory is the defini- 
tion of the temporal projection predicate: holds. 

7.1 The Projection Predicate in the Event Calculus 

holds{P,T) <— do{A,T' ,Ti) A initiates{A,Ti, P) 

A Ti < T A -^clipped{Ti, P,T) [ECl] 

clipped(Ti, P,T 2 ) ^ do{A,T' ,T) A terminates{A,T, P) 

A Ti < T A T < T2 [EC2] 

These axioms are different from most formulations of the EC (in particular 
[19]) in that the well-known predicate happens{Event,Time) is replaced by the 
predicate do{Action, Starting -Time, FinishingSTime)^ . 

7.2 The Base-Case of done in the Event Calculus 

As we said before, we use iffPP for interpreting OPENLOG programs and gen- 
erating plans. The execution of those plans is interleaved with their generation 
and also with the assimilation of inputs from the environment([18], [6]). It is 
known ([11], [31], [25]) that to make an abduetive theorem prover [33] behave 
as a planner, one has to define properly the set of abducibles, say Ab. In the 

® in this case with certain sacrifice in expressiveness, however. The operators -1- and 
par would have to be excluded from the language as it is. 

® The intention is to have the name of the agent also represented by a term in the 
predicate: do{Agent, Action, Starting-Time, Finishing-Time). For the sake of sim- 
plicity, however, the term for agents is omitted here. 



OPENLOG: A Logic Programming Language Based on Abduction 289 



present context one can make Ab = {do, <,<,=}. The background theory can 
then be completed with the following definition (the base case of done) : 

done{A,To,Tf) <— primitive{A) A do{A,To,Tf) [DNECO] 



Notice that we do not include the predicate preconds{A,To) in [DNECO]. 
Strictly speaking, one should be “testing” the preconditions of action A at this 
point. We, however, leave to the programmer the job of testing preconditions 
within OPENLOG code (i.e. if C then., expressions). 

7.3 How to Achieve the Inhibition of Abduction 

As can be seen, the projection predicate holds is involved in the interpretation 
of every conditional expression in OPENLOG. Thus, to inhibit abduction, we 
simply establish that no do atom “derived” by unfolding a holds atom will be 
abduced. In this way, the holds predicate is used for “testing”, whereas the base 
case of done is used for generation of plans, as we explained above. 

8 Discussion 

OPENLOG is a logic programming language that can be used to write procedural 
code which can be combined with a declarative specification of a problem domain 
(a background theory). 

To define the language, logical characterization has been given to the tradi- 
tional programming structures (if then else, while, ;,...) in such a way that 
any program written with those structures can be translated into a set of logical 
sentences. 

This mapping from procedural code to logical sentences is not only sought 
for the sake of clarity. The logic chosen to provide semantics for the procedural 
structures can also be used to specify a theory of actions that models dynamic 
uni verses [6]. This theory of actions can be based on Kowalski and Sergot’s Event 
Galculus [20] , a logical formalism with an ontology based on events and proper- 
ties that can be initiated and terminated by events. The Event Galculus provides 
a solution to the Frame Problem and also permits the efficient representation of 
concurrent activities and continuous domains. This has permitted the extension 
of the capabilities of standard PASGAL to allow for the description of parallel 
actions in OPENLOG programs. 

Thus, the designer/programmer is offered a specification-implementation lan- 
guage that can be used to model complex universes and also to write high-level 
algorithms to guide the activities of agents acting in a dynamic environment. 

As in other logic programming languages, programs in OPENLOG are pro- 
cessed by a theorem prover. Unlike in other approaches, however, programs in 
OPENLOG are intended to be interpreted^*^ rather than compiled^^. The reason 

As in JAVA [23] and other commercial products, where code is pre-compiled to an 
intermediate form to be read by an interpreter/executive. 

As in Situated Agents [29] and GOLOG [21] 
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for this is crucial. The process of planning (the theorem prover transforming 
goals into plans) must be interleaved with the execution of those plans and the 
inputting and assimilation of observations. One has to expect many modifica- 
tions and amendments of the plans. The system as a whole will process inputs 
as soon as it can, increasing its chances of an opportune response (normally by 
an minor adjustment to its plans as illustrated in [6]). The first practical con- 
sequence of this is that the system will generate and use partial plans which 
it will refine progressively as its knowledge of the environment increases. This 
is a crucial difference between OPENLOG’s aims and those of a similar logic- 
based programming language: GOLOG [21]. We have explored the similarities 
and differences between GOLOG and a previous version of OPENLOG in [.5]. 

Partial planning may seem atypical in the current context because theorem 
provers are normally backward-reasoning mechanisms. An interesting aspect of 
the representation here discussed is that it supports planning by searching the 
time line in a forward direction. This is called progression. The representational 
strategy that supports this form of planning is not new. It is at the core of a 
well known device to specify grammars and to program their parsers: Definite 
Glause Grammar or DGG [26]. OPENLOG programs are like DGGs in that 
they both are higher level macros that can be completely and unambiguously 
translated into logic programs. Unlike DGG however, OPENLOG provides for 
negative literals. 

There is another critical difference between OPENLOG and DGG. In DGGs, 
the “state of the computation” (which in that case contains the sentence being 
parsed) is carried along through arguments as is common in stream logic pro- 
gramming. This has the inconvenience of requiring the explicit representation of 
all objects in the application domain and is, therefore, cumbersome and limiting 
(we tested the approach in the prototypical implementation of pathfinder reac- 
tive automatas that do forward planning, reported in [4]). Background theories 
are a flexible and powerful alternative to this approach. 



9 Conclusions and Further Research 

OPENLOG is a logic programming language. In OPENLOG one can write pro- 
cedural code combined with a declarative specification of a dynamic domain (a 
background theory) to guide an agent at problem-solving in that domain. 

The interpreter of OPENLOG is an abductive proof procedure which can be 
used to implement the planning module of an agent [6[. One innovative aspect 
of this work is that the agent processing and executing OPENLOG programs 
will stay open to the environment and will allow for changes in its environment 
and assimilation of new information generated by these changes. 

Another novelty in this work is that we use a logic program (the definition 
of done and the other predicates) to specify the semantics of an imperative 
programming language. The semantics is provided as a mapping that links the 
semantics of the imperative code with any semantics for abductive logic pro- 
grams. The definition of done has some other operational advantages. It can 
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serve as an interpreter for OPENLOG, thus providing its operational seman- 
tics as well. And it can be used to “inhibit” the abductive proof procedure and 
prevent the over-generation of abducibles which would make of abduction an 
impractical approach for building the planning module of an agent. 

We are exploring the relationship between OPENLOG and programming 
with integrity constraints [6]. Also in [6], “the Elevator example” is borrowed 
from [21] and is developed in with OPENLOG. We plan to use OPENLOG as 
the programming language for each agent in a platform to simulate multi-agents 
systems. 
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Abstract. Starlog is a temporal logic programming language that sup- 
ports declarative specification of reactive systems, input-output 
behaviour and destructive updates. This paper presents an operational 
semantics for Starlog. Its correctness and completeness with respect to 
a model semantics are proved. 
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1 Introduction 

Starlog is a temporal logic programming (TLP for short) language. It evolved 
from such applications as simulation [9] and deductive databases [28]. These 
applications require temporal relationships between objects to be specified in a 
precise and direct manner. It is this requirement which dictates the design and 
implementation of Starlog. 

While Starlog is similar in many ways to deductive databases it is intended 
as a full general programming language. This means in particular that two as- 
sumptions of deductive databases are untrue. The first is that the logic program 
can be finitely stratified on the predicate names. This is replaced by a more 
general ordering using integer timestamps and predicate names. The second as- 
sumption that is violated is the Datalog assumption that terms can only be of 
some finite depth. Starlog allows both unbounded terms and constraints. This 
paper is intended to provide a precise operational semantics for such a general 
bottom-up logic programming language. 

Starlog uses the syntax of the constraint logic programs (CLP hereafter) [21]. 
Starlog is a CLP language with arithmetic constraints over integers and equal- 
ity /disequality constraints over terms. This is different from other TLP languages 
that are based on a particular temporal logic [1,2,4,14,32,47]. Unlike other CLP 
languages such as CLP(R) [22] and BNR Prolog [33], Starlog programs are ex- 
ecuted bottom-up. This is suitable for its intended applications which often use 
the specification of a real world system to construct a temporal model of the 
system. 

Over the past decade, a working prototype implementation of Starlog has 
been developed and challenging applications have been written in Starlog. This 

G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 294-310, 1999. 
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paper provides Starlog with a formal operational semantics. Its purpose is two 
fold. Firstly, while the present implementation works experimentally, it is im- 
portant that the implementation be verified with respect to a formal semantics. 
Secondly, the present implementation is rather inefficient. To improve its effi- 
ciency, it is necessary to perform various semantic based program analyses, for 
which a formal operational semantics is a necessity. 

This paper presents an operational semantics for Starlog. Its correctness and 
completeness with respect to its model semantics are given. The operational 
semantics is a bottom-up execution mechanism which deals with negative literals 
in the same way as positive ones. 

As programs in TLP languages such as Templog [1], Tokio [2], Temporal 
Prolog [14], Tempura [32] and Chronology [47] can be translated into CLP pro- 
grams [7,34], our operational semantics offers a bottom-up execution mechanism 
for these TLP languages which use extensions of the SLD resolution as opera- 
tional semantics. 

Temporality is expressed in Starlog by explicitly timing truth. The opera- 
tional semantics works on the class of Starlog programs that can be stratified in 
terms of time and predicate symbols. It generates in time order those facts whose 
ground instances are in the model semantics of the program. It repeatedly gen- 
erates a fact using the program and transforms the program. Time-or deredness 
and stratification guarantee the correctness and the completeness of the opera- 
tional semantics. Time-orderedness is essential in temporal applications such as 
simulation. 

The rest of this paper is organised as follows. Section 2 introduces a sub- 
set of Starlog. The subset is the core of Starlog and is chosen to simplify the 
presentation of the paper. Section 3 introduces the notion of temporally strat- 
ified Starlog programs and defines its model semantics, and section 4 presents 
the operational semantics and gives its correctness and completeness. Section 5 
concludes the paper and compares our operational semantics with related work. 
We assume that the reader is familiar with the terminology of constraint logic 
programming [21]. 

2 Starlog Language 

Starlog was developed for specification and implementation of applications which 
require temporal reasoning. Starlog doesn’t directly support temporal operators. 
However, most temporal operators can be programmed in Starlog as indicated 
in [7]. Moreover, Starlog allows more explicit temporal relationships to be ex- 
pressed directly. 

As a CLP language, Starlog can adopt any model semantics developed for 
the CLP scheme. This paper defines the model semantics of Starlog based on 
the stable model semantics [17]. Unlike other CLP languages such as 
[22,10,12,11,30,5,46], Starlog uses an explicit parameter for time. This can be 
thought of putting timestamps on truth values. 
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2.1 Syntax 

Terms and constraints are formed as in other CLP(X) languages. Constraints 
are arithmetic constraints over integers and equality/disequality constraints over 
terms. We will use T> to denote the underlying domain structure which inter- 
prets arithmetic constraints over integers and equality/disequality constraints 
over terms in the usual way [6,8]. There is no complete constraint solver for 
arbitrary integer arithmetic constraints. However, there are powerful decidable 
subsets of integer arithmetic constraints [27,31,19,39,20,44,41,18]. This paper 
focuses on decidable subsets of integer arithmetic constraints and assumes that 
T> is satisfaction complete. The assumption, which can be relaxed, helps us to 
separate the issue of the completeness of the constraint solver from that of the 
completeness of the operational semantics itself. Substituting a decidable do- 
main of integer arithmetic constraints for T> will result in a particular instance 
of Starlog. An atom is defined to be of the form p(si, • • • , Sn)@t where t, called 
the timestamp, is a term consisting of variables, integers and arithmetic opera- 
tors while each Si is an arbitrary term. A literal is either an atom or negation of 
an atom. A clause is of the form h <— <5, L where ^ is a constraint, h an atom and 
L a conjunction of literals. A Starlog program is a finite set of clauses. A clause 
without any body literal but possibly including constraints is called a fact while 
other clauses are called rules. 



2.2 Causality 

Causality is natural in temporal reasoning and is also a useful assumption which 
simplifies Starlog programming. Some other TLP languages also assume the 
causality of programs [4]. Causality means that no truth in the past is defined 
in terms of truth in the future. Thus, the following clause fails in Starlog. 

retrospective_reasoning@T 

T=S-1000, current_f indingOS . 

Formally, a clause is causal if, in any U-model of the clause, the timestamp of 
its head is no less than the timestamp of any literal in its body. Starlog implicitly 
adds causality constraints to program clauses. 



2.3 Examples 

Time in Starlog is discrete and positive. The following program defines a predi- 
cate even that is true at even time points and thus generates even numbers. 

7„ even numbers program. 

evenOO . 

evenOT T=S+1, not(even@S). 
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The following program generates prime numbers as a time sequence using the 
predicate prime. It also generates non-prime numbers by predicate mult. T >= 2, 
T = J * K , K >= J and J >= 2 are constraints over integers. 

7o prime numbers program. 

primeOT T>=2, not(mult@T). 

multOT T=J*K, K>=J, J>=2, prime® J. 



2.4 Normalised Programs 

In the sequel, we shall only be concerned with normalised programs. A nor- 
malised program is a set of normalised clauses. A clause is normalised if its head 
and every atom occurring in its body are of the form p{Xi, ■ ■ ■ ,Xn)@T where 
Xi, ■ ■ ■ , Xn and T are different variables. It is obvious that corresponding to each 
program, there is a semantically equivalent normalised program. The causality 
constraints implicit in a normalised Starlog program can be easily added. 

2.5 Notations 

Let C = {h@T <— L), and t be an integer. Define C-* {h@T <— (T > 

L) and {h@T ^ (T < t),(5, L). Let P be a program. denotes 

the set of facts in V and V'" the set of rules in V. V = U V''. We sometimes 
write a clause as h@T ^ S,Aj^jaj@Sj,Ak^Knot{ak@Sk) with J and K being 
disjoint sets of indices. V~ denotes the set of those rules in V'^ which have only 
negative literals, and denotes P’’ \ V~ . Rules in V~ are called negative and 

those in positive. Let <5 be a constraint. Define sat{S) {T> \= 3.(5). sat{6) 
is true iff S is satisfiable with respect to T>. Let Q be a set of clauses. Define 

[Q]v ^ L) I (h ^ (5, L) e Q A (/i is a valuation) AV\= p{S)} 

We write [{5'}]^ as [5'J.p for simplicity. 

3 Stratification and Model Semantics 

This section defines the class of temporally stratified programs and their model 
semantics, and lays the technical ground for the operational semantics of tem- 
porally stratified programs including negation. 

3.1 Temporally Stratified Programs 

Stratification has been a useful notion in formulating the semantics of logic 
programs with negation [3,15,38,37]. The idea of stratification is to disallow 
recursion through negation. In other words, stratification makes it impossible for 
a predicate to recursively invoke itself through negation. This is guaranteed by 



298 Lunjin Lu and John G. Cleary 



requiring that any predicate symbol occurring negatively in the body of a clause 
belongs to a lower stratum than the head predicate symbol and any predicate 
symbol occurring positively in the body belongs to a stratum no higher than the 
head predicate symbol. 

The timestamps in Starlog programs relax the above condition for strati- 
fication in that recursion through negation is allowed provided it is through 
decreasing timestamps. Let be the set of the predicate symbols in V and Nat 
be the set of natural numbers. Let strat be a function from to Nat. We extend 

strat as follows. strat{p{s)) = strat{p) and strat{{h -f— L)) = strat{h). 

Definition 1. A program V is temporally stratified if there is a function strat : 
Nat such that, for every rule h@T <— 6, Aj^jaj@Sj, Ak^Knot{ak@Sk) in 
V, for every j € J, either U |= (<5 — > (T > Sj)) or strat(aj) < strat{h), and for 
every k £ K, either T) \= {6 ^ {T > Sk)) or strat(ak) < strat{h). 

The above definition augments the traditional predicate stratification in the 
literature with time stratification. Procedure calls are primarily stratified on 
timestamps and secondarily on predicate symbols. It ensures that recursive calls 
through negation in a temporally stratified program involve time decrements. 
Under the assumption that T> is satisfaction complete, temporal stratifiability is 
decidable. An algorithm for finding a predicate stratification function strat for 
a logic program in the literature, such as that in [45] (page 134), can be readily 
adapted for Starlog. 

3.2 Model Semantics 

We first recall the stable model semantics for logic programs [17] and then define 
model semantics of temporally stratified Starlog programs. Let Q he a, logic 
program consisting of a set of ground clauses and A4 be a set of ground atoms. 
Then the Gelfond-Lifschitz transformation is defined as follows. 

GL{g,M) = {H ^ pos{L) \ {H^L) eg AM \= neg{L)} 

where posfL) is the conjunction of positive literals in L and negiJL) is the con- 
junction of negative literals in L. GL{g, M) is a definite logic program obtained 
by removing those clauses in g whose bodies contain the negation of an atom in 
M and deleting negative literals in other clauses in g. 

A set M of atoms is a stable model of if A4 is the least model of GL{g,M). 
If g is locally stratified then g has a unique stable model which is also the least 
model of g. For locally stratified programs, the stable model semantics coincides 
with the perfect model semantics [36,35] and the well-founded semantics [16]. 

Let 7^ be a temporally stratified Starlog program. [P]^ is a locally stratified 
logic program. Therefore, [V]-jy has one stable model which is also the least 
model of [P]-p. We take the unique stable model of [P]^ as the canonical model 
semantics of P’, denoted as GM{V). The operational semantics of temporally 
stratified Starlog programs computes a representation of GM{P) in time order. 
A representation of GM{V) is a set T of facts such that \T\^ = GM{V). 
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Let f be a fact. Elements of [f]p are called ground instances of f. Note that 
ground instances of f has an empty body and can be thought of as ground atoms. 
Let Ad be a set of ground atoms. We say that f is contained in A4 if all ground 
instances of f are in A4, i.e., [f]^ C Ad. 

3.3 Approximate Success Time 

Let V be the current program. A key step in the operational semantics is to 
determine the minimum timestamp that an atom in CM{V) has. Because of 
negation, the minimum timestamp can not be determined by the facts in V 
alone. Rules have to be taken into account. For example, the prime numbers 
program doesn’t contain any fact and yet prime@T ^ T > 2,T < 4 is in its 
model semantics. 

As we require the operational semantics to generate facts in time order, we 
naturally expect that such a fact be generated from a clause that has the smallest 
success time where the success time me of a clause C is defined as follows. Let 
C be h@T <— 5, L. Then 

me ‘= min{t | V \= fi{S A (T = t)) A CM{V) \= fi(L) A is a valuation)} 

me is not computable but serves as a useful reference. The operational semantics 
uses a conservative approximation the to me to choose a program clause from 
which the next fact is generated. 

the min{t \ sat{5 A (T = t))} 

the is a conservative approximation to me in that the never exceeds me. the is 
determined by C alone. This is in contrast to me which also depends on other 
clauses in V. me is computable as T> is satisfaction complete. If C is a fact 
then rhe = me. We define approximate success time for a set of clauses as the 
minimum of the approximate success times of the clauses in the set. 

3.4 Extracting Facts from Negative Rules 

Time-or deredness requires that each time a fact is generated it has the smallest 
success time. If the smallest approximate success time of the program happens 
to be that of a fact in the program then the situation is simple. It is also the 
smallest success time of any fact in the model semantics of the program. 

The situation becomes more complicated when the approximate success time 
of a rule is smaller than those of the facts in the program. In this case, there 
is a possibility that facts with success times smaller than those of the facts 
in the program can be generated by rules in the program, as shown later. We 
will develop a method to extract such facts from the program. The method is 
based on a few properties of temporally stratified programs that are detailed 
below. The following lemma states that if the success time of a fact in the model 
semantics of the program is smaller than those of the facts in the program then 
a negative rule in the program derives the fact. 
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Lemma 1. Let V he a program and a@t he a ground atom such that a@t S 
CM{V) and t < m-pf. Then there are a valuation v and a negative rule C = 
(h@T <— S, AkeKnot{a.k)) such that V \= v{5), ly(ak) ^ CM{V) and v{T) < t. 

Proof. By contradiction. Assume there were no negative rule C = {h@T <— 
5, AkeK’not(ak)) and valuation v such that V \= v{5), v{ak) ^ CM{V) and 
v{T) < t. Then, t' > t for each fact h'@t' in GL{\P]p,, CMifP)). By the causality 
requirement, a@t ^ CM(fP) as CMifP) is the least Herhrand model of 
GL{[V]p,CM{V)). □ 

According to lemma 1, if all negative rules have an approximate time no less 
than tfi-p/ then the success time of the next fact to generate is trip/ . 

When m-p/ > rfip-, a fact can be extracted from a negative rule, as is stated 
in the following lemma. 

Lemma 2. Let V he a program such that rhp/ > rhp 
C = {h@T <— 5, AkeKnot{ak@Sk)) he a negative rule in V such that rhc 
and strat{C) < strat(C') for any other negative rule C' in V with me 
Then [h@T ^ 5 A {T = mp-)]^ C CM{V) . 

Proof. First consider the simple case where C is the only one negative rule 
whose approximate success time is equal to rhp- . Let v he an arbitrary valu- 
ation such that T> ^ v{5 A (T = rfip-)). The temporal stratification and causal- 
ity requirements ensure that for each k € K, either (a) v{Sk) < mp- or (h) 
{v{Sk) = rhp-) A {strat(ak) < strat{h)). In the case (a), v{ak@Sk) ^ CM{P) 
hy lemma 1. In the case (h), we also have i>{ak@Sk) ^ CMifP) as shown in 
the following. We have Pred(afe) ^ Pred(/i) where Pred(a) is the predicate sym- 
bol of the atom a. If Pred(afc) is not defined by any positive rule then every 
clause for Pred(afc) has an approximate success time greater than rhp- because 
C is the only negative rule whose approximate success time is equal to rhp- . 
This implies iy{ak@Sk) ^ CM{V). Now suppose that Pred(afc) be defined by 
a positive rule ak@S <— (5',L. Either L contains a positive call to a predicate 
q other than Pred(/i), implying v{ak@Sk) ^ CM{V) because the approximate 
success time of any negative rule for q is greater than mp- , or every posi- 
tive literal in E is a call to Pred(/i), also implying v{ak@Sk) CM(fP) be- 
cause such a call must involve a time decrement to satisfy the temporal strat- 
ification requirement. Thus,v{ak@Sk) ^ CM{V) in the case (b). So, {T> |= 
v{5 A (T = rhp-))) — > v{ak@Sk) ^ CM{V) for any valuation v, which implies 
[h@T ^ S A {T = mp-)]p C GL{[V]p, CMifP)). 

Now suppose that there be more than one negative rules whose approximate 
success times equal mp- . Each such a clause defines a predicate in a stratum. 
Consider a negative rule C = {h@T <— d, AkeKnot{ak@Sk)) that has the lowest 
stratum among these negative rules. The same reasoning as in the above para- 
graph leads to [h@T <— <5 A (T = mp-)\p C GLi\P]p,CMiP)). 

The lemma follows because CMiV) is a model of GLi[V]p, CMiV)) . □ 



and 
= rhp- 
= fhe- 
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4 Operational Semantics 



This section presents an operational semantics for temporally stratified pro- 
grams. 

Let V be the program. The operational semantics enumerates a representa- 
tion of CM{V) in time order. It repeatedly generates a fact f and then uses f to 
transform the current program V into a new program V' . The transformation 
is done in such a way that any ground fact in CM(V) is either a ground in- 
stance of f or in CM{V'), and that any fact in CM{V') is also in CM{V). The 
operational semantics keeps track of a timer. The timer records the minimum 
timestamp that the next fact to generate could have. The operational semantics 
ensures that all the facts with a timestamp smaller than the timer and in the 
model semantics of the original program has been generated and that all other 
facts in the model semantics of the original program are in the model semantics 
of the current program. 

As the operational semantics discards generated facts, a newly generated 
fact must be propagated through each rule that matches the fact. The propaga- 
tion results in several clauses which replace the rule through which the fact is 
propagated. Let C = {h@T ^ 6, Aj^jaj@Sj , Ak^Knot{ak@Sk)) be a rule in P’’, 
f = (h' ^ 5') be the generated fact, and t be the value of the timer when f is 
generated. Then C is replaced by a set of clauses obtained by (1) replacing each 
ai@Si in the body of C with {ai@SiA{Si > t)V{ai@Si = pi{h')Api{S'))) where pi 
is a renaming substitution, (2) converting the resulting body into its disjunctive 
normal form and throwing away conjuncts with unsatisfiable constraints, and 
(3) for each remaining conjunct, producing a clause with h@T as its head and 
the conjunct as its body. Formally, 

prop{{h@T ^ 5, Aj^jaj@Sj,AkeKnot{ak@Sk)), (h' ^ 

{h@T ^ 7, L I (7, L)gKKA sot(7)} 



with KK = DNF ( ^ V = p^(h')) A Pj{5')), \ 

V AfeG/c ^<Sk > t A ak@Sk V {ak@Sk = Pk{h')) A pk{S')) ) 
where DNF{F) is the disjunctive normal form of F. 

The definition of prop{C, f , t) doesn’t distinguish atoms that match with the 
generated fact from those that do not. If ai@Si doesn’t match with h' then 
{ai@Si A {Si > f) V {ai@Si = pi{h')) A pi{S') is equivalent to ai@Si A {Si > t) and 
the effect is to strengthen the constraint part of C. 

Let a be an atom. We define Ta,v as the minimum of the approximate success 
times of the clauses in V whose heads have the same predicate symbol as a. 



Algorithm 1 Given a temporally stratified program V and its predicate strati- 
fication function strut, the algorithm enumerates in time order a representation 
of CM {V). 

— Initialisation. 



t := 0 

■P := {h ^ (5, L I (h ^ 5, L) G -P A sot(<5)} 
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- While do 

(la) t := n^(-pfuv-) 

(l b) V := U -P+ U UcGP- extrJ,{C,V) where, 

extrJ,{{h@T <— S, AkeKnot{ak@Sk)),V) '= 

{h@T ^ 7 , L I ( 7 , L) € KK A sat^j)} 

with KK = DNF{5, AkeKii^k < Ta,^,v)'^ {{Sk > Ta^,v) Anot{ak@Sk)))) ■ 

(l c) If t ^ m-pf then V := V U {h@T ^ 5, {T = i)}, assuming that C = 
[h@T <— 5, L) is a negative rule in V sueh that me = t and strat{C) < 
stratiC') for any other negative rule C in V with the = t- 

(II) Choose a faet f = {h'@T' <— S') from sueh that sat{{T' = t) A S'). 

(Ill) V-.= Vf\ {f} U (JcGV^ propiC, f , t) 

□ 

We sometimes write step (Ib) as V := extrJ{V) with extrJ,{V) U 

U extrJ,{C,V). We also write step (Ic) as V := extrjp(V,f). Define 

extr{V,t) extrjp{extrJt{V),t). 

Given a program V and a predicate stratification function strut that can be 
obtained by a stratification algorithm [45] (page 134), the operational semantics 
first initialises the timer to 0 and removes the clauses with unsatisfiable body 
constraints and then repeatedly generates a fact and transforming the current 
program into a new program as follows. 

Step (la) sets the timer to the minimum time at which the body constraint of 
any fact contained in CM{V) can be satisfied. The minimum time is determined 
hy and V~ according to lemmas 1 and 2 and is no less than the previous 
value of the timer. Step (Ib) extracts positive information from negative rules 
in V~ using time stratification as follows. Each negative literal not{a@S) in a 
negative rule is replaced by 

S < Ta,v y {S > Ta,v) A not{a@S) 

Each such rule is then normalised, resulting in a set of clauses which replace 
the original negative rule. Step (Ic) extracts a fact from a negative rule using 
predicate symbol stratification. It ensures that there is always a fact to choose 
in step (II). Step (II) generates a fact f. Through invocation oIprop{C, f, t), step 
(III) replaces an atom ai@Si with {ai@Si A {Si > t) V [ai@Si = pi{h') A pi(5')j). 
The first disjunct allows the further solution to ai@Si to be considered while the 
second propagates its solution provided by f. 

The operational semantics is indeterministic. There might be several facts 
that can be extracted from negative rules in the current program at step (Ic). 
There might also a number of facts whose success time is equal to the current 
value of the timer. The operational semantics indeterministically chooses one at 
these steps. 
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Example 1. This example illustrates the operational semantics using the even 
numbers program. The following is the normalised even numbers program com- 
pleted with causality constraints. 

7„ even numbers program. 
evenOT : - T=0 . 

evenOT T=S+1,S>=0, not(evenOS). 

The program is temporally stratified. The configuration after initialisation is 
illustrated in Figure 1(a). 

As there is no positive literal in the body of either clause. V'' =V~ through- 
out the execution of the program. The first iteration is as follows. The approx- 
imate success time of V~ U is 0. So, t = 0 after step (la). In step (Ib), 
the goal not{even@S) in the only rule in V~ is replaced by (S' < 0) V (S >= 
0) A not{even@S). Normalising the resulting clause gives rise to two clauses: (a) 

evenOT T=S+1,S>=0,S<0. 

and (b) 

evenOT T=S+1,S>=0, not(evenOS). 

(a) is thrown away as it has an unsatisfiable constraint part while (b) is a rule 
and replaces the original rule. Thus, after step (Ib), V~ contains one rule (b) 
and V-^ contains one fact: (c) 

evenOT :- T=0. 

Step (Ic) doesn’t change the configuration. Step (II) selects (c) from . Step 
(III) removes (c) and propagates it through (b), resulting in the following rules 

evenOT :- T=S+1,S>=0,S<>0, S<0. 

evenOT :- T=S+1,S>=0,S<>0, S>=0, not(evenOS). 

The first clause is discarded as its body constraint is unsatisfiable. The configu- 
ration after the first iteration is illustrated in figure 1(b). 

Now consider the second iteration. is empty while V~ contains one rule. 
2 is the approximate success time of V~ . So, t = 2 after step (la). Step (Ib) 
extracts from the rule the following two clauses: (d) 

evenOT :- T=S+1 , S>=0 , SoO , S<2 . 

and (e) 

evenOT :- T=S+1 , S>=2 ,not (evenOS) . 

(d) and (e) replace the original rule. Step (Ic) doesn’t change the configuration. 
Step (II) generates (d). In step (III), (d) is removed and propagated through (e), 
giving rise to the following clauses 

evenOT :- T=S+1,S>=2,S<>2,S<2. 

evenOT :- T=S+1,S>=2,S<>2,S>=2, not(evenOS). 
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(a) Intial Configuration: 

Timer: Current Program: Generated Facts: 

t = 0 

even@T : - T=0 . 

even@T :- T=S+1,S>=0, 

not (evenOS) . 

(b) Gonfiguration after 1st iteration: 



Timer: Current Program: 

t = 0 



evenOT :- T=S+l,S>=0,SO0, 

not (evenOS) . 

(c) Configuration after 2nd iteration: 



Generated Facts: 
evenOT :- T=0. 



Timer: Current Program: Generated Facts: 

t = 2 



evenOT :- T=S+1,S>=2,S<>2, 

not (evenOS) . 



evenOT : - T=0 . 
evenOT : - T=2 . 



Fig. 1. The first three configurations for even number program 



The first clause is discarded and the second replaces (e). The configuration after 
the second iteration is illustrated in figure 1(c). 

□ 

We now present the correctness and the completeness of the operational se- 
mantics. It is obvious that the timestamps of generated facts are in ascending 
order. In the sequel, we will denote the original program by V, the current 
context after the iteration by {ti,Vi) and the generated fact during the 
iteration by f^. Thus, the sequence of configurations obtained during the exe- 
cution of V is {to,Vo)r ■ ■ , ■ ■ ■ where {to, Vo) is the initial configuration 

and the sequence of generated facts are fi, • • • , f^, • • •. Note that ti+i and f^+i are 
determined by {ti^Vi). 

The following lemma shows that the model semantics of the current program 
can only contain ground facts whose timestamps are no less than the minimum 
time at which the body constraint of a clause is satisfiable. 

Lemma 3. Let a@t be a ground atom. If a@t G CM {Vi) then t > Ta.Vi- 

Proof. Let a = p{s) and the set of the clauses for p in Vi be {p{x)@T <— 
5o,Lo I 1 < o < m}. Let ^ = {i > s, T i— > f}. Since p{s)@t G CM{Vi), we 
have V \= p{5o) and CM{Vi) ^ pij-'o) for some 1 < o < m. t > Ta,Vi since 
V {p{5o) ^ {p{T) >Ta,Vi))- ° 

The following lemma shows that steps (Ib) and (Ic) preserve the meaning of 
the program. 
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Lemma 4. CM {Vi) = C M{extr{Vi,ti+i)) . 

Proof. Let Q = extr_t{Vi) and R = extr_p{Q,ti+i). We first prove CM {Vi) = 
CM{Q) by showing CL{[extr_t{C,Vi)]j,,CM{Vi)) = CL{[C].j,,CM{V^)) for 
eaeh C G V~ . Let C = {h@T <— S, AkeKnot{ak@Sk))- 

(C) Let fx{h@T <— ) € CL{[extr_t{C,Vi)].p,CM{Vi)). Then extrJ:{C,Vi) eon- 
tains a C = {h@T ^ 5, Afcgifbfc) such that V ^ pb{5) and CM {Vi) \=t> 
bfc. bfc is either Sk < Tak,Vi or {Sk > Ta^.-Pi) A not{ak@Sk). It can be 
shown that p{ak@Sk) ^ CM {Vi) in both cases, implying p{h@T <— ) G 
GL{[C]^,CM{V,)). 

(V) Let p{h@T <— ) G CL{[C].p,CM{Vi)). Then V \= p{6) and p.{ak@Sk) ^ 
CM{Vi) for each k G K. Let C = {h@T ^ 6, Ak^xnotfbk)) with hk being 
Sk < Tak,Vi ifl^\= KSk < Tak,Vi) or being {Sk > Ta^.Vi) A not{ak@Sk) oth- 
erwise. Then we have p{h@T <— ) G CL{[C']j,, CM {Vi)) implying p{h@T <— 
) G GL{[extr_t{C,V^)].p,CM{V^)). 

It remains to prove CM{Q) = CM{R)). ti+i = rh(Q/uQ-) because, for 
any C = {h@T <— S, Ak^Knot{ak@Sk)) in Vi, there is C in Q such that 
mc' = me where C = {h@T ^ S, AkeK^k) with b^ being Sk < Tak.Vi if 
V 1= {5,{Sk < Tofc.-pJ) and hk being {Sk > Ta^,Vi) A not{ok@Sk) otherwise. If 
ti+i = tfig/ then R = Q and hence CM{Q) = CM{R). Otherwise, Q contains a 
C = {h@T <— S, Ak^Knot{ak@Sk)) such that me = ti+i, strat{C) < strat{C") 
for any other negative rule C" in Q with mc" = L+i, and R = Q U {h@T <— 
S A {T = ti^i)}. By lemma 2, we have [h@T <— 5 A (T = C CM{Q) 

and {V \= p{6 A {T = ii+i))) — *■ p.{ak@Sk) ^ CM{Q) for any valuation p. 
So, GL{[R]jj,CM{Q)) = GL{[Q]jj,CM{Q)) implying that CM{Q) is the least 
Herbrand model of GL{[R]jj,CM{Q)). Therefore, CM{Q) = CM{R). □ 

The following lemma states that each cycle of iteration is correct and com- 
plete with respect to the model semantics. 

Lemma 5. CM{Vi) = [U+i]-u U CM{Vi+i). 

Proof. Let R = extr{Vi,ti+\). By lemma 4, it suffices to prove CM{R) = 
U CM{Vi+i) by proving (1) CM{R) D U CM{Vi+i) and (2) 

CM{R) C [f,+i]^UCM(lP,+i). 

(1). CM{R) D [fi+i]p since f^+i G R. So, it suffices to show CM{R) O 
CM{Vi+i). is locally stratified and hence CM{Vi+\) is the least Her- 

brand model of [Vi+i]j,. Therefore, it reduces to prove that CM{R) is a model 
CM{R) is a model of \P( \ {fi+i}].p. So, it remains to prove that 
CM{R) is a model of [prop(C, f^+i, for each C G i?’'. Let C be h@T <— 

S, Aj^jaj@Sj, AkeKnot{ak@Sk) and f^+i be h' ^ 5'. Then prop{C,U+i,ti+i) 
is h@T ^ (5, Ajgjbj, AfegKnot(bfc) where b; is ai@Si A {Si > fi+i) V {ai@Si = 
pi{h')) A pi{S'). Let V be an arbitrary valuation such that T> \= v{S), 
(i) CM{R) \=x) v{hj) for j G J, and (ii) CM{R) v{hk) for k G K. 
(i) implies iy{aj@Sj) G CM{R) because [fi+i]p ^ CM{R) and (ii) implies 
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v{ak@Sk) ^ CM{R). So, u{h@T) € CM{R) because CM{R) is a model of 
This completes the proof of (1). 

(2). Let W = U CM(Vi+i). It reduces to proving that W is a model 

of [R\x> because R is locally stratified and CM{R) is the least Herbrand model of 
[R]-p. As R^ C U {fi+i}, it suffices to prove W is a model of [C\^ for every 
C € i?’’. Let C be h@T <— S, Aj^jaj@Sj, AkeKnot{ak@Sk) and f^+i be h' ^ S'. 
Then prop{C,fi+i,ti+i) is h@T <— 6, Aj^jhj, AkeKnotfbk) where b; = ai@Si A 
{Si > ti+i) V {{ai@Si = pi(h')) A pi{6')). Let v be any valuation such that V |= 
v{5), (Hi) W j= v{aj@Sj) for j G J and (iv) W v{ak@Sk) for k € K. (Hi) 
implies CM{Vi+i) \=t> i^(bj) for otherwise, iy{aj@Sj) ^ CM{Vi+i ), ^ 

[fz+l]x) and hence v{aj@Sf) ^ W. (iv) implies CM{Vi+i) ^^(bfc) for other- 
wise, either CM{Vi+i) \=-d v{ak@Sk A {Sk > k+i)) or i'{ak@Sk) e [U+i]xi> 
contradicting v{ak@Sk) ^ W. So, v{hWT) G CM{Vi+i) ffW as CM{Vi+i) \= 
[prop{C,^i+i,ti+i)]^ and hence W |= [C]^,. This completes the proof of (2). □ 

The following theorem establishes the correctness of the operational seman- 
tics, that is, every generated fact is contained in the model semantics of the 
program. 

Theorem 2. C CM{V) for each i > 0. 

Proof. We have CM{V) = - \J CM{Vi+\) by repeatedly applying 

lemma 5. So, [U]x> ^ CM{V) for each i > 0. □ 

The following theorem states that the operational semantics is complete in 
the sense that any ground atom in the model semantics of the original program 
is a ground instance of a generated fact or in the model semantics of the current 
program and that any ground atom in the model semantics of the original pro- 
gram with a timestamp smaller than the current value of the timer is a ground 
instance of a generated fact. 

Theorem 3. Let f be a fact. // [f]p C CM{V) then 

(a) [f]x) C [{fi, • • • ,fj]^ U CM{Vi); and 

(b) [f<‘-]^C[{fi,...,f,_i}]^. 

Proof, (a) is a corollary of lemma 5. (b) follows lemmata 5 and 3. □ 

5 Conclusion and Discussion 

We have presented a bottom-up operational semantics for temporally stratified 
Starlog programs. Its correctness and completeness with respect to its model 
semantics are given. For simplicity, we have assumed that every atom is times- 
tamped. The operational semantics can be easily modified to cope with un- 
timestamped literals by applying only predicate symbol stratification to un- 
timestamped literals. 
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The operational semantics strictly generalises the previous work on bottom- 
up execution of CLP programs. There has been little effort on bottom-up execu- 
tion of CLP programs and bottom-up execution of CLP programs has only been 
proposed for constraint deductive databases [25,23]. These proposals do not deal 
with negation. Upon generating a fact, our operational semantics propagates it 
through rules in the current program resulting in a new program. This removes 
the need for maintaining a list of generated facts and the need for garbage col- 
lecting useless facts in the list. 

Bottom-up execution has been proposed for general logic programs. Among 
others. Pages [13], Teusink [43], Kemp etc. [24], and Sacca and Zaniolo [40] pro- 
posed fixpoint operators for computing stable and well-founded models of general 
logic programs. A major problem with applying these operator to Starlog pro- 
grams is that none of these operators ensure the time-orderedness of generated 
facts which is essential in temporal applications such as simulation. Further- 
more, a fact in a stable model of a general logic program can be generated by 
operators in [13,43,40] only after the model has been fully constructed. This is 
because a fact added to a model under construction may have to be withdrawn 
from the model later in order to resolve an inconsistency in the model. A second 
major problem is that these operators do not deal with constraints. Though our 
model semantics is based on the stable semantics of ground general logic pro- 
grams, our operational semantics deals with Starlog programs directly instead 
of the corresponding ground general logic program. This is necessary because 
the ground general logic program corresponding to a Starlog program is usually 
infinite. This is in contrast with deductive databases for which the operators 
in [40] and [24] are formulated where the ground general logic program is finite. 

TLP languages such as Templog [1], Tokio [2], Temporal Prolog [14], Tem- 
pura [32] , Chronology [47] use top-down operational semantics that extend SLD 
resolution. Brzoska shows that Templog programs can be translated into CLP 
programs [7], and Orgun et. al suggest that programs in other TLP languages 
can also be translated into CLP programs [34]. Thus, our operational semantics 
offers a bottom-up execution mechanism for these TLP languages. 

Xiao et. al propose a bottom-up algorithm for executing Starlog programs 
without rigorous proof of its correctness and completeness [49]. Xiao’s algorithm 
also generates facts in time order and uses causality to deal with negation. How- 
ever, its correctness and completeness with respect to a model semantics are not 
addressed. Our operational semantics works on constraint programs while Xiao’s 
doesn’t. 

The operational semantics of Starlog presented in this paper is abstract. We 
have so far not considered the issue of termination for a number of reasons. 
Termination is undecidable, and there is no operational semantics which will 
terminate on all programs. Also, we observe that techniques for improving the 
termination of bottom-up evaluation of logic programs can be easily incorporated 
into our operational semantics without affecting its correctness and complete- 



ness. 
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The following example illustrates the need for subsumption tests [25,23,42,29]. 
Consider the following program. 

p(X)@T T=0, X=a. 

p(X)@T T>=0, p(X)@T. 

It is obvious that the meaning of this program is the singleton set consisting of 
the fact 

p(X)@T T=0, X=a. 

It is easy to verify that each iteration of the operational semantics will gener- 
ate the above fact and leave the program unchanged. That is, the program does 
not terminate. Simple checking for duplicate solutions will solve this case but in 
general more powerful subsumption tests are necessary [29]. This raises complex 
issues of the tradeoff between computational efficiency and the set of programs 
on which the operational semantics will terminate. 
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Abstract. We first define a new fixpoint semantics which correctly mod- 
els finite failure and is and-compositional. We then consider the prob- 
lem of verification w.r.t. finite failure and we show how Ferrand’s ap- 
proach, using both a least fixpoint and greatest fixpoint semantics, can be 
adapted to finite failure. The verification method is not effective. There- 
fore, we consider an approximation from above and an approximation 
from below of our semantics, which give two different finite approxima- 
tions. These approximations are used for effective program verification. 

Keywords: Abstract interpretation. Logic programming. Program ver- 
ification, Finite failure. 



1 Introduction 

Assume we have a semantics defined as least fixpoint of a continuous operator F 
on the lattice of “interpretations” and an interpretation I which specifies the ex- 
pected program semantics. The program is partially correct w.r.t. I iff Ifp(F) C I. 
A sufficient partial correctness condition, which can be verified without actually 
computing the fixpoint is F(I) C I. 

In the case of logic programs, this is the approach taken by declarative de- 
bugging (diagnosis) [21,22], where the semantics is the least Herbrand model. 
The approach has been extended to model other observable properties such as 
correct answers [12], computed answers and their abstractions [7]. In [23,17], this 
technique has been recently related to other techniques used in logic program ver- 
ification by showing that all the existing methods [4,11,2] can be reconstructed as 
instances of a general verification technique based on the above defined sufficient 
condition, where the semantic evaluation function (and the notion of interpre- 
tation) can be chosen by using abstract interpretation techniques [9,10] so as to 
model pre- and post-conditions, call correctness and specifications by means of 
assertions. The overall idea is that the property one wants to verify is simply an 
abstract semantics on a suitable abstract domain. 

There is one interesting and specific property of logic programs, finite failure, 
which is not an abstraction of none of the semantics used in the above mentioned 
techniques and/or verification frameworks. Diagnosis or verification of finite fail- 
ure is somewhat related to the diagnosis of missing answers in [13], where the 
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actual semantics is the greatest fixpoint of the standard ground immediate con- 
sequences operator (i.e. the complement of a set of atoms which contains the 
finite failure set and some atoms whose execution does not terminate). 

However, if we want to verify properties of finite failures, we need to start 
from a fixpoint semantics modeling finite failure. 

Unfortunately all the semantics defined for finite failure so far are not ad- 
equate for our purposes. The (ground) finite failure set FFp (the set of ground 
atoms which finitely fail in P) [3] does not model non ground failure. The Non- 
Ground Finite Failure set NGFFp (the set of finitely failed non ground atoms in 
P) [16] was proved in [14] to be correct w.r.t. finite failure and and-compositional 
(i.e. the failure of conjunctive goals can be derived from the the behavior of 
atomic goals only). However, IMGFFp has no fixpoint characterization. 

Our first step was then the development of a fixpoint definition for NGFFp. 
The fixpoint semantics defined in [15] is derived from a semantics which extends 
with infinite computations the trace semantics in [8], by defining a Galois in- 
sertion modeling finite failure. The corresponding abstract fixpoint semantics 
correctly models finite failure and is and-compositional. 

In this paper we take this semantics (shortly described in Section 2) as the 
basis for a verification method (defined in Section 3), which extends to finite 
failure Ferrand’s approach [13], which uses two semantics (a least fixpoint and 
a greatest fixpoint semantics) and two specifications. In particular, we apply 
Ferrand’s approach using a least fixpoint semantics (Tp^ '[ tu) and a Tp^ J, cu 
semantics. We obtain a nice interpretation for the verification w.r.t. Tp^ J, cu 
semantics, i.e. Tp^ I cu models the unsolvable atomic goals as introduced in [5]. 
The verification method is not effective. We consider therefore an approximation 
from above (Section 4.1) and an approximation from below (Section 4.2), which 
give two different finite approximations of the Non-Ground Finite Failure set 
and of the success set, the set of atoms which have a successful derivation. 

Finally, in Section 5, we make the techniques of Section 3 effective by using 
the approximations from above and from below of Section 4 applied to the least 
fixpoint semantics and to the Tp^ J, cu semantics respectively. 

2 A Fixpoint Semantics for Finite Failure 

As already mentioned, the finite failure semantics operator of definition 3 is 
systematically derived from a trace semantics which models successful and infi- 
nite derivations by using abstract interpretation techniques. A Galois insertion 
modeling finite failure is defined on an abstract domain suitable to model finite 
failure and to make the abstract operator complete (i.e. precise). Here we just 
give the semantics for finite failure, together with some technical definitions, 
which are needed to achieve a better understanding. 

The reader is assumed to be familiar with the terminology of and the basic 
results in the semantics of logic programs [1,18] and with the theory of abstract 
interpretation as presented in [9,10]. Moreover, we will denote by x and t a tuple 
of distinct variables and a tuple of terms respectively, while B and G will denote a 
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(possible empty) conjunction of atoms. By - 0 i -Sn. "•• • we indicate a (pos- 

sibly infinite) sequence of substitution such that Vi > 1 dom(- 0 i) = dom(- 0 i+i ) 
and fit < -81+ 1 . When we consider a sequence of substitutions for a goal G we 
assume all the substitutions in the sequence to be relevant for G. 

Finite failure is a downward closed property, i.e., if G finitely fails then GF 
finitely fails too. Moreover it enjoys a kind of “upward closure” . Namely, if the 
goal G does not finitely fail, then there exists a (possibly infinite) sequence of 
substitutions Fn. , such that for every G^ which finitely fails, there 

exists a j, such that G' does not unify with Gfin, for h > j. 

Note that the above mentioned sequence of substitutions can be viewed as 
the one computed by an infinite or successful derivation for the goal G. If we 
cannot find such a sequence for the goal G, then G finitely fails. Now, suppose we 
know that a given set C of goals finitely fails. We can infer that an instance GF 

of a goal G finitely fails if for all sequences of substitutions :: :: fin ^ • , 

there exists a G^ S C such that V i, G^ unifies with Gfii. 

The intuition behind the above remarks can be formalized by an operator on 
Goals, where Goals is the domain of goals of a program P. 

Definition 1. Let C C Goals and G G Goals. 

U-Pg(C) = C U {Gfi I for all (possibly infinite) sequences 

of relevant substitutions for the goal G 

fii :: :: fin . , 

there exists a G € C such that 
Vi, G', unifies with Gfifii }. 

upe is a closure operator, i.e., it is monotonic w.r.t. set inclusion, idempotent 
and extensive. Note that Up(x) ''^Pp^(x)’ predicate in P, is a closure operator 
too. 

The extended Herbrand base By for P is the set of atoms built with the 
predicate symbols of P (Tip) on the domain of terms with variables T. Let S be 
the domain of downward closed subsets of By, which are also closed with respect 
to Up(x) ’^Vp{x)- — ) is semantic domain. (S, C) is acomplete lattice where 

the least upper bound of Xi , X2 G S is the set Up(x) ^PpU) (^1 ^2), while the 

gib is intersection. 

The next operator, given two atoms p(t) and A, defines all the instances of 
p(t) which do not unify with A. 

Definition 2. Let p(t), A G By . 

NUnifp(t)(A) = { p(t)Y | p(t)y is not unifiable with A } 

Let us now define the fixpoint operator. 
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Definition 3. Let I G S. 

Tp^(I) = { p(t) I for every clause defining the procedure p, 
p(t) :-B G P 

p(t) G up^^,^,(Nunifp(x)(p(t)) U 

{p(t)-0 I -0 zs a relevant substitution for p(t), 

B0 G upg^ (C) }) 

where C = {Bcj | B = Bi , . . . , Bn 3 B^ff G 1} 

Note that Tp^(I) includes atoms such that for each clause either they do not 
unify with the head of the clause or, after unification, the body of the clause 
B0 belongs to the upg^ closure of the goals Bcr which finitely fail according to 1 
(that is, if B = Bi , . . . , Bn there exists Btcr G I). 

Tp^ is monotonic and continuous. Unfortunately Tp^ is not finitary. In the next 
sections, we will define approximations of such an operator which will allow us 
to derive information on finite failure in an effective way. 

By defining the ordinal powers Tp^ I i in the usual way, our semantics will 
be lfp(T^^) = lub({ 1 1- 1 1- < t*^}) = (Ui<o,T^^ T i). 

Example 1. Assume Lp = {f, a) and Pi be the program 

Pi: q(a):-p(X) 
p(f(X)) :-p(X) 



T« Tl=Tff(0)={q(f(X)),q(f(f(X))),... 

q(f(a)),q(f(f{a))),... 

P(a) }, 

T«(T/f T1)={ q(f(X)),q(f(f(X))),... 

q(f{Q)),q(f(f{a))),... 
p(a),p(f(a)) }, 



lfp(T«)={ q(f(X)),q(f(f(X))),... 

q(f{Q)),q{f(f{a))),... 

p{a),p(f{a)),p(f(f(a))),...}. 

Consider now 

?2 ■■ q(a) : -p(X) 
p(f(X)) : -p(a) 
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T"T1={ q(f(X)),q(f(f(X))),... 

q(f(a)), q(f(f(a)]), . . . 

P(a) }, 

T«T2 = { q(f(X)),q(f(f(X))),... 

q(f(a)), q(f(f(a)]), . . . 
p(X),p(f(X)], . . . 
p(a),p(f(a)), . . . }, 



lfp(T^f)={q(X),q(f(X)),q(f(f(X))),... 

q(a),q(f(Q)),q(f(f(Q))),... 

p(X),p(f(X)),... 

p(Q),p(f(a)),... }. 



Finally, consider 



P3:p(f(X),f(f(X))):-p(X,f(X)] 

q(f(Y),f(Y)):-q(Y,Y) 

Wp(T^p={p(F^(X),f’-(X)), m^^n+l, 

p(ti,t 2 ), t) or t 2 ground terms 
q(f-(X),f"^(X)), m^^n, 
q(ti,t 2 ), ti or t 2 ground terms). 

As we already pointed out, this fixpoint semantics was automatically derived 
by abstract interpretation from the operational semantics of (possibly infinite) 
trace via a fair selection rule. This assures us that Ifp(Tp^) really models the 
non ground atoms that have a finite failure. Note that Ifp(Tp^) gives a direct 
fixpoint characterization for the set of non ground atoms which finitely fail in P, 
NGFFp [16]. Moreover the following theorem shows how this automatic deriva- 
tion also allows to define simpler conditions than the one presented in [14] for 
and-compositionality. 

Theorem 1. Let G S Goals. 

— Ifp(Tp^) = Ifp(Tg^) iff every goal G has the same behavior w.r.t. finite failure 
in the program P and in the program Q. 

— the goal G finitely fails in P iff 

G e up^f ({ GF I G = Bi , . . . , and 3 B^S e lfp(T^^ )}) 

The first property (correctness) assures that Ifp(Tp^) correctively models finite 
failure. While the second property (and-compositionality) tells us how to infer 
the behavior of conjunctive goals from information on the finite failure of atomic 
goals only. 
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Example 2. Consider the program P 3 of example 1. The goal (p(H, V), q(H, V)) 
finitely fails in P 3 , since (p(H, V), q(H, V)) S up|^,H_v),q(H,v)) where 
C={ p(f-(X),r-(X)),q(f-(X),r-(X)),m^n+1, ’ 
p(f-(X),T-(X)),q(f-(X),f’-(X)),m^n, 

P(ti ,t 2 ), q(ti .ti) ti or ti ground terms} 

This is true, because, for all possible sequences of substitutions di " • • • 

for (p(H, V), q(H, V)), there exists a (p(H, V), q(H, V))cj e C which unifies with 
each (p(H,V),q(H,V))«i. 

3 Using Expected Least Fixpoint and Tp^ J, cu Semantics 
in Program Verification 

Once we have a fixpoint semantics modeling finite failure, we can state the usual 
condition Tp^(S) C S, which is a sufficient condition for partial correctness since 
it implies NGFFp = Ifp(Tp^) C S, where S is the intended Non-Ground Finite 
Failure set. The above condition is not effective because, as already noted, Tp^ 
is not finitary and S is an infinite set. We will tackle this problem later, by using 
finite computable approximations of the semantics. For the time being, we want 
to show that we can define stronger verification conditions, by using Ferrand’s 
approach using two intended semantics. 

The semantics considered in [13] is based on the standard ground imme- 
diate consequences operator Tp. Two different sets of expected properties are 
considered. 

— a set of properties S to be verified by the Ifp(Tp) (partial correctness means 

Wp(Tp) C S). 

— a set of properties to be verified by the gfp(Tp) (S^ C gfp(Tp)). 

The standard sufficient condition for partial correctness based on S (Tp(S) C S) 
allows us to reason about the ground success set. In addition, there exists a new 
sufficient condition (S' C Tp(S)), which originally was viewed as a condition 
somewhat related to sufficiency or missing answers (according to declarative 
debugging). The same condition allows us to reason about the behavior modeled 
by the complement of the greatest fixpoint of Tp, which strictly includes the 
(ground) finite failure set. 

However, S' cannot be thought as the complement of the intended ground fi- 
nite failure set, since the inclusion is strict. Remember also that the ground finite 
failure set does not fully characterize finite failure (for non-ground conjunctive 
goals). 

As the ground immediate consequences operator Tp, our fixpoint operator 
Tp^ is not co-continuous. Moreover, Tp^ J, cu has an interesting characterization, 
since, by theorem 2, it models unsolvable atomic goals [5]. 

Then we apply Ferrand’s approach to our least fixpoint semantics and to 
Tp^ I cu thus obtaining stronger results. In fact, let S be the expected Non- 
Ground Finite Failure set (which fully characterizes finite failure). The condition 
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Tp^(S) C S guarantees that the actual Non-Ground Finite Failure set (Ifp(Tp^)) 
is indeed included in the intended one. 

The following theorem shows that the complement (w.r.t. Bv) of Tp^ | cu has 
a very interesting characterization as the set of atoms which have a successful 
derivation. 

Theorem 2. p(t) £ Bv has a successful derivation if and only «/p(t) ^ Tp^ J, ci>. 

We can then provide a specification S ' of the complement of the set of atoms 
which are intended to succeed and derive another meaningful sufficient condition 
S' C Tp^(S'), which will guarantee that S' is indeed included in Tp^ | to, i.e. 
that the actual set of successful atoms is included in the intended one. 

As already mentioned, the above sufficient conditions can be turned into ef- 
fective conditions, by taking finite approximations of the semantics (and finitary 
abstract versions of Tp^). 

Using two semantics and two specifications will allow us to use two different 
(related) abstractions. In particular, in the next section, we will introduce an 
upward approximation and a downward approximation of N GFFp , both somehow 
related to depth k abstraction. The idea of considering upward and downward 
approximations for verification and debugging has recently been proposed in 
[6]. In Section 5, we will apply this idea by using the upward approximation of 
the least fixpoint semantics and the downward approximation of the Tp^ J, cu 
semantics. 

4 Towards Effective Approximations of Ifp(Tpf) and 
TfUtu 

The semantics of Section 2 is not decidable. In order to infer that an atom belongs 
to Tp^ t i- + 1 1 we may need to look at infinitely many elements of Tp^ f i. 

It is therefore interesting to define an abstraction of Ifp(Tp^) and of Tp^ f cu 
on an abstract domain which gives a “correct” approximation of the set of atoms 
which finitely fail in P and of the set of atoms which have a successful derivation. 
The natural idea is to “approximate” an infinite set of atoms by means of a finite 
set of atoms whose depth is not greater than k. 

We consider here the definition of depth given by Marriott and Sondergaard 
in [19] for finite expressions, i.e. Exp = T U By. Let N be the set of natural 
numbers not including 0, Seq denote the set of all finite sequences of natural 
numbers, and e £ Seq denote the empty sequence. The length of a sequence 
s £ Seq is denoted by |s|. 

Definition 4. Let e G Exp, s G Seq, |s| > 0. The subexpression of e at s, e[s] 
is recursively defined by 

— e[is] = ei[s] if e — ,■■■ , Cn) otherwise T, 

- e[e] = e. 

The positions of e, Pos(e) = {s G Seq | e[s] 7^ T}. If |s| = k, e[s] is a level k 
subexpression of e. Then depth(e) = max{ |s| | s G Pos(e)}. 
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The previous definition on atoms naturally extends to Goals. As we will show 
in the following, for Ifp(Tp^) and Tp^ X cu there exist two useful ways of defining 
correct approximations. Namely, we can take approximations which represent 
either subsets or supersets of lfp(Tp^] (or Tp^ | tu). 

The two approximations are defined on the same domain (the above defined 
domain of atoms whose depth is not greater than k) . Of course they will define 
two different abstractions. 

For the sake of simplicity here we consider an upward and a downward ap- 
proximation for Ifp(Tp^) only. But since our approximations are based on an 
abstract domain and an abstract fixpoint Tp^ operator the same approximations 
can be applied also to Tp^ | tu. 

In the next section we will first consider a new depth k abstraction, which 
can be used as a upward approximation, i.e. an abstraction in the usual sense. 

4.1 Approximating Ifp(Tp^) from Above 

In general, an upward approximation of a semantics on the depth k domain is 
expected to have the following properties. 

1. For every goal p(t] belonging to the concrete semantics, for every choice of 
k, p(t) belongs also to the concretization of the abstract semantics. 

2. For every goal p(t), which does not belong to the concrete semantics, there 
exists a k such that p(t) does not belong to the concretization of the abstract 
semantics. 

The first property guarantees correctness. The second property tells us that we 
can always improve the precision by choosing a better (greater) k. This property 
should hold at least for the majority of goals. 

In the case of the finite failure semantics, property 1 it is easy to achieve by 
using the standard abstraction on the depth k domain [20] . Achieving property 2 
is instead a rather difficult task. Finite failure, in fact, is a universal property. In 
order to infer that an atom finitely fails, we need to know whether its instances 
(with arbitrary depth) finitely fail. Unfortunately, on the standard depth k do- 
main, the property of finite failure becomes existential for a “cut” atom. By 
correctness, in fact, we can infer that a cut atom finitely fails, if we find one 
instance which finitely fails. 

Example 3. Consider the program 

P:p(f(X)):-p{X). 

All the ground instances of p(X) finitely fail. For any k, by correctness, the cut 
atom p(f'^(V)) should belong to our abstraction. This is because there exists 
at least a ground instance p(X)-8 of p(X) of depth greater than k which finitely 
fails. p(X)fi must belong to the concretization of our abstraction. According to 
our abstraction, in addition to p(f'^(X)) and all its instances, also p(X) and all 
its instances finitely fail. 
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As it is shown by the previous example, if an atom p(t) has an SLD tree (via 
a fair selection rule) with just infinite derivations, which rewrite the atom p(t) 
infinitely many times (called perpetual in [14]), we can not find a k such that p(t) 
does not belong to the concretization of the abstract semantics on the depth k 
domain. Note, in fact, that all the ground instances of such an atom finitely fail. 
Then condition 2 does not hold for a very large class of goals. 

We are therefore forced to define a more complex abstract interpretation on 
a new depth k domain, for which property 2 holds. 



The abstract domain First consider the set V of variables, disjoint from 
the set V of program variables. Consider also V', the domain of non-idempotent 
substitutions ij>', such that dom(rj>')Urange(r(i') C V andVt, x/t G r];' Var(t)n 
dora(i|)') ^ 0. We define the domain of substitutions T = ^'116. In the following 
€ W. 

Definition 5. The abstract domain consists of atoms of the form p(t)i|), 
p € rip such that: 

1. depth(p(t)rl;) < k + 1 . 

2 . 

3. if'i\> = e, Vff, cr' G Pos, p(t)cr G V implies |cr| = k and 
(cr ^ cr' p(t)ff ^ p(t)cr'). 

4- 7 ^ e, Var(p(t)rl;) C V. 

Each concrete atom is abstracted by considering the following function aa on 
atoms. 



Definition 6. 



>(t)Tl> 



OCa(p(t)) = < 



depth(p(t)) > k + 1 and the set 

A4={p(t')xJj|^GV, 3ip(t) =p(t')iMJ^, 

depth(p(t')il)) < k + 1} 
is not empty ,p(t)i|) G Ai and Vp(t')i|) G Ai, 
depth(p(t)i|)) < depth(p(t')r(i) 



,p(t") otherwise 



where t” is obtained by replacing each subterm rooted at depth greater than k by 
a new fresh variable belonging to V. 

The function iXa extends naturally to Goals. 

Example 4- Let k = 2. Consider now aa(p(f (g(Q)))) = p(f(W)), W G V and 
aa(p(f(g(X)))) =p(f(Y)){Y/g(Y)} = aa(p(f(g(g(g(X)))))). Note that 
0 Ca(p(f(f(g(X))))) = p(f(W)), W G V. The predicate p(f (f(f (X)))) will be ap- 
proximated by p(f(X)){X/f(X)}. aa(q(f(f(a)),f(a))) = q(f(W), f(a)), W G V 
and aa(q(f(f(f(X))),f(f(Y))) = q(X, Y){X/f(X), Y/f(Y)}, aa(q(f (f(X))), f (X)) = 
q(f(X),X){X/f(X)}. 
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Definition 7 (partial order). Let Consider two abstract atoms 

p(t)tl), p(t')tl^'. 

p(t)tl) < p(t^)i|)^ if there exists an idempotent substitution -8, dora(-0) = V, 
range('d) = V U V, and an i such that 

p(t)tl)d = p(t') tjj't))' . . . and dom(tl)) n dom(-0) = 0 



The abstraction function First we define the optimal version of the upg 
operator. 

Definition 8. 

up«"”(C)=C U { G'S I Vfi idempotent substitution 

and Vt|) G V which satisfies 

dom(F) n dom(t(i) = 0, depth.(GF-0tl)) < k + 1 , and 
Var(Gfifii|;) C VU doTa(t()) 

GMtl) e C } 

Let be the downward closed (w.r.t. to the order on abstract atoms) subset of 
the depth k atoms which are also closed with respect to Up(x) ^Fp^(x') ■ 
is our semantics domain. is a complete lattice where the least upper 

bound of Xi , X 2 G is the set Up(x) ^kp^(x') (^1 U X 2 ), while the gib is simply 
the intersection. 

Definition 9. Let X G S 

a^P(X) ;= y upJ^^,^'J({ aa(q(t)) | q(t) G X}) 

P(x) 



Lemma 1. and its adjoint form a Galois connection. 

For the sake of simplicity, we assume that k is always greater than the depth 
of the head of the clauses in the program P. Therefore the Nunif’^f , operator 

P (X J 

becomes 

Definition 10. Let k > depth(A). 

(A) = { p(t)xj) I p(t)r(> G p(t) < p(t) and there exists an i 

such that p(t) i)) i)) ... does not unify with A} 



and the abstract fixpoint operator is 

Definition 11. Let I G and k > depth(A), A any head of a clause in P. 
Tp^ ” (I) = { p(t)r(> I p(t)i|) G for every clause defining p, 
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p(t) : -B e P, 

p(t)tl; e (Nunifp[^,(p(t)) U 

{aa(p(t)-8) I depth.(p(t)-0) < 2k, 

•9 relevant substitution for p(t), 

aa(Bd) Gupr”(C) }) 
where C = {Bcr | B = Bi , . . . , 3 Bicr € I). 

Note that Ifp (Tp^ ’’ ] is now effectively computable since Tp^ is finitary. 

Example 5. Let W G V, k = 2 and consider Pi in example 1, 

Wp(T^r)={q(f(X)),q(X){X/f(X)},q(f(W)),q(f(a)),p(Q),p(f(Q)),p(f(W))} 

Consider P 2 in example 1. 

Wp(T^r ) = { q(a), q(X), q(W], q(f(X)), q(X){X/f(X)}, 

P(a),p(X),p(f(a)),p(f(W)),p(f(X)),p(X){X/f(X)}}. 

Finally, consider 

P 4 : q(a) : -p(X) 
p(X);-s(f(f(a))) 

For k = 2, 

Ifpllff”] = { q(X), q(Q), q(f(X)), q(f(a)), q(f(W)), q(X){X/f(X)}, 
p{X),p(Q),p(f(X)),p(f(a)),p(f(W)),p(X){X/f(X)}, 
s(Q),s(f(a)),s(f(W)),s(X){X/f(X)} } 

which is not equal to (l-fp(Tp^ )) = Ifp ) , for k = 3. 



4.2 Approximating Ifp(Tp^) from Below 

We consider the depth k domain, i.e. the domain of atoms whose depth is not 
greater than k + 1 . 

We first need to define the optimal version of the up^^ operator 
Definition 12. Let C he a set of goals whose depth is not greater than k, 

U-Pg”'(C) = C U 

{ Gff I depth(Gff) < k + 1 and 

Vff' such that depth(G9-9') < k + 2 
there exists a G G C which unifies with G'0'9'}U 
{ Gff I depth(Gff) > k + 1 and there exists a G G C 
which unifies with G9}. 
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Let be the downward closed subset of the depth k atoms which are also closed 
with respect to Up(x) (^^*^>— ) i® semantic domain. It is a complete 

lattice where the least upper bound of Xi , X 2 G S^'' is the set Up(x) ^kp^(x') (^1 U 
X 2 ), while the gib is simply intersection. 

The abstraction function just selects the atoms which have a depth not 
greater than k + 1 . 

Definition 13. Let X G S and X‘^ G 

cx'^’-(X) { p(t) I p(t) G X and depth.(p(t)) < k + 1} 

T’^'(X“):={p(t)d|p(t)GX“ } 

Lemma 2. < > is a reversed Galois insertion, i.e., 

a’’HnXt) =n(a^HXt)). 

The above lemma holds since just selects those atoms whose depth is not 
greater than k. 

The optimal Nunif operator is defined in the usual way. 

Definition 14. 

lMunifpJ^.j(A) = { p(t)-0 I depth(p(t)) < k + 1 

and p(t)-0 does not unifies with A} 

and the optimal abstract fixpoint operator turns out to be 

Definition 15. Let I G 

Tp^ (I) = { p(t) I depth.(p(t)) < k + 1 and 

for every elause defining p, p(t) : — B G P, 

p(t) G upf^,^')(Nunif^[^)(p(t)) U 

{p(t)-0 I depth(p(t)-0) < k + 1 , 

•9 is a relevant substitution for p(t), 

BdGupr'(C) }) 

where C — {Bcr | B = Bi , . . . , 3 Bicr G I}. 

As in the case of upward approximation, lfp(Tp^'’') is effectively computable. 
Example 6. Consider the program Pi of example 1. For k = 2, 

while for k = 3, 

Ifp(T^f') = {q(f(f(X))), q(f(f(Q))), q(f(X)), q(f(a)),p(Q),p(f(a)),p(f(f(a)))}. 
Consider P 2 of example 1 . For k = 2, 

lfp(Tpf ={q(f(X)),q(f(a)),q(a), q(X),p(Q),p(X),p(f(X)),p(f(a))}. 
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Finally, lfp(Tp^'’') = oc^'-(lfp(Tp^ )). Of course this is not always the case. As 
shown in the next example. 



P 5 ; q(a) : -p(X) 
p(X):-s(f(f(Q))) 

Fork = 2, lfp(Tff^)={q(f(X)),q(f(a)),q(f(b]),q(b],s(Q)s(f(Q)),s(b), 
s(f(b))} which is not equal to a'’'-(lfp(Tp^ )) = lfp(Tp^*’'], for k = 3. 

5 Abstract Finite Failure Verification 

As already mentioned, we will use the upward abstraction of the least 
fixpoint of Tp^ and the downward abstraction a'’’' of the Tp^ I cu and two corre- 
sponding specifications 

— Sccup is the abstraction of the intended Non-Ground Finite Failure set. 

— S^bi is the oc^'’ abstraction of the intended set of atoms which either finitely 
fail or (universally) do not terminate. Alternatively, can be viewed as 
the complement of the set of atoms (of depth < k) which have a successful 
derivation. 



Definition 16. Let P be a program. P is correct w.r.t. the finitely failed atoms 
not deeper than k if 

Cl a^P(lfp(T«)) C S„up. 

C2 S^bi C a^^(TfUcu). 

The previous conditions assure us that not only the program is correct w.r.t. 
finitely failed atoms not deeper than k, but also that the set of depth k successful 
atoms is correct w.r.t. the complement (w.r.t. a'’’'(Bv)) of S(^,bi. 

The following theorem gives us sufficient effectively computable conditions 
for Cl and Cj to hold. 

Theorem 3. Let P be a program. Lf the following conditions SCi and SCj hold 

SCI T""" (S«up ) c S«up . 

SC2 S^b. CT«”'(S;bJ. 

then P is correct w.r.t. the finitely failed atoms not deeper than k. 

Note that, as was the case for abstract diagnosis [7], correctness is defined in 
terms of abstractions of the concrete semantics, while the sufficient conditions 
are given in terms of the (approximated) abstract operators. 

The following examples show that, by using both sci and SC2, we can get 
more precise verification conditions. 
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Example 7 . Consider the following program for list concatenation. 

Pi ; append([],X,X) : — list([]). 

append([X|Y], Z, T] : — append(Y, Z, [X|T]). 
list([]). 

list([X|Y]) ;-list(Y). 

Consider now a specification Saup on the domain defined, for a given k, as 

{ append(Xi ,X2,X3) | Xt G and there exists a j such that 

Xj is not a list}U 

{<^aux(cippend(Xi ,X2,X3)) | each Xi, is a list but 

X3 is not unifiable with Xi • X2}U 
{o^aiix(l-i-st(X)) I X is not a list} 

where 0Caux(p(t)) replaces each subterm of t of depth greater than k with a new 
fresh variable belonging to V. 

It is easy to see that (Sa^ip ) C S^pp. Hence, according to theorem 3 , 
i^up(Hp(Tp^ )) C S«pp holds and the program is correct w.r.t. the intended 
depth k finite failure set. 

Consider now the specification which is the intended complement of the 
depth k set of atoms which have a successful derivation, which, for a given k, is 

{(append(Xi ,X2,X3)) | depth(Xi) < k 3 a j Xj is not a list}U 
{(append(Xi ,X2,X3)) | depth(Xi) < k Xi, is a list but 

X3 is not unifiable with Xi • X2}U 
{ list(X) I depth(X) < k and X is not a list) 

Note that in this case 2 Tp^'’^ ), for any k > 1 . append}} ],a, a) 

belongs to S^bi yet append}} ], a, a) does not belong to Tp^”' }S^bi). 

Something goes wrong in this case. 

append}} ],a, a) should fail according the intended specification. However, 
in Pi append}} ], a, a) has a successful derivation. This means that, in this case, 
i to). 



Example 8. As in example 7 , assume that P2 is the program obtained from Pi, 
by replacing the first clause of Pi by 

append}} ], X, X) : — list}X). 

Assume S«up and as in Example 7 . Now, for a given k, Tp^ }Sa'‘p ) ^ S«up . 
Moreover also C Tp^'’^ }S^bi ). This implies that P2 is correct w.r.t. the 
depth k finite failure. This means that the depth k finite failure set satisfies the 
expected S«pp and also that the depth k set of successful atoms in P2 satisfies 
the complement of S}^bi- 
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SCi and SC2 are just sufficient conditions. Hence, if they do not hold we can 
not conclude that we have a bug in the program. However this is often the case 
and conditions violations can be viewed as warnings about possible errors. 

For example, assume that SCi does not hold. We can say that there exists a 
p(t), such that for all instances of the clauses defining p(t), p(t) : — Bi , . . . , Bn, 
the goal anp(Bi,... ,Bn) finitely fails in Soc^ip, yet p(t) ^ Sa^p. There might 
be a missing clause, which would allow p(t) to either succeed or have an infinite 
derivation, as required by p(t) ^ S^up . 

Assume SC2 does not hold. 

This means that there exist a p(t) S an instance of a clause p(t) : 

— Bi , . . . , Bn, and an i, such that Vh. € , h, does not unify with Bicr. There 

might be an error in the clause, which corresponds to a missing successful deriva- 
tion of p(t). 

Example 9. Let P3 be the program obtained from Pi of example 7, by re- 
moving the first clause. Assume k > 3. Tp^'"’ (Socpp ) 2 Sapp. Note, in fact, 
that append([ ], [a], [q]) € T^^“” (S«pp ), yet append([ ], [a], [a]) ^ S«pp. This 
means that some clause for the procedure append is missing, which would cause 
append([], [a], [a]) to have a successful or infinite derivation. 

Consider now P2 as in example 8. S^bi 2 Tp^(S^bi)- For example, 
append([ ],Q, a) belongs to S^bi yet append([ ], a, a) does not belong to 
Tp^(S^bi)- The problem here is that there is a wrong clause, append([ ], X, X) : 
— list([ ]), which forces append([ ], a, a) to have a successful derivation, while 
append([ ], a, a) is expected to have a finite failure. 

Let us finally note that the above notions are related with the notions of error 
and co-error as defined in [13]. 

6 Conclusion 

In this paper we have introduced a new fixpoint semantics which models finite 
failure. This semantics is considered as the basis for a verification method, which 
extends to finite failure Ferrand’s approach [13], which uses two semantics (a 
least fixpoint and a greatest fixpoint semantics) and two specifications. We apply 
Ferrand’s approch using a least fixpoint semantics and a Tp^ J, uj sematics. By 
defining an approximation from above and an approximation from below, which 
give two different finite approximations of the Non-Ground Finite Failure set 
and of the success set, we make the extension of Ferrand’s verification method 
to finite failure effective. 

One may wonder whether there exist other abstract domains which can be 
used to derive meaningful sufficient conditions for effective verification of finite 
failure. One idea which we are currently pursuing is to use the abstract domain 
of assertions as defined in [23,17]. In this case the abstract domain is a set of 
assertions which are formulas in a logic language. This would allow us to express 
the intended behavior using a very natural and intuitive formalism. As it is 
shown in [23,17], the proof that a verification condition holds, boils down to 
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proving that a formula is valid in a particular model. An interesting result is 
that whenever the assertion language is decidable [24] the verification conditions 
can be effectively checked. 
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Abstract. We present a slicing approach for analyzing logic programs 
with respect to non-termination. The notion of a failure-slice is presented 
which is an executable reduced fragment of the program. Each failure- 
slice represents a necessary termination condition for the program. If 
a failure-slice does not terminate it can be used as an explanation for 
the non-termination of the whole program. To effectively determine use- 
ful failure-slices we combine a constraint based static analysis with the 
dynamic execution of actual slices. The current approach has been inte- 
grated into a programming environment for beginners. Further, we show 
how our approach can be combined with traditional techniques of termi- 
nation analysis. 



1 Introduction 

Understanding the termination behavior of logic programs is rather difficult due 
to their complex execution mechanism. Two different intertwined control flows 
(AND and OR) cause a complex execution trace that cannot be followed easily in 
order to understand the actual reason for termination or non-termination. The 
commonly used procedure box model introduced by [2] for debugging, produces a 
huge amount of detailed traces with no relevance to the actual termination be- 
havior. Similarly, the notion of proof trees is not able to explain non-termination 
succinctly. 

Current research in termination analysis of logic programs focuses on the 
construction of termination proofs. Either a class of given queries is verified to 
guarantee termination, or — more generally — this class is inferred [9]. In both 
cases that class of queries is a sufficient termination condition and often smaller 
than the class of actually terminating queries. Further this class is described in a 
separate formalism different from logic programs. Explanations why a particular 
query does not terminate are not directly evident. 

* On leave of: Technische Universitat Wien, Institut fiir Computersprachen 
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We present a complementary approach, that is able to localize and explain 
reasons for non-termination using a newly developed slicing technique based on 
the notion of failure-slices. Failure-slices expose those parts of the program that 
may cause non-termination; under certain conditions, non-termination can be 
proved. 

Slicing [15] is an analysis technique to extract parts of a program related to 
or responsible for a particular phenomenon (e.g. a wrong value of a variable). 
Originally, slicing was developed for imperative languages by Weiser [15,16] who 
observed that programmers start debugging a program by localizing the area 
where the error has to be. Using program analysis techniques, this process can 
be partially automated, simplifying the comprehension of the program. Only 
recently, slicing has been adopted to logic programming languages by Zhao [17], 
Gyimothy [5], and Ducasse [14]. While these approaches focus on explaining 
(possibly erroneous) solutions of a query, we will present a slicing technique for 
explaining non-termination properties. It is an implementation of a previously 
developed informal reading technique used in Prolog-courses [11,12] which is 
used within a programming environment for beginners [13]. 

In contrast to most other programming paradigms, there are two different 
notions of termination of logic programs - existential [8] and universal termi- 
nation. A query terminates existentially, if one (or no) solution can be found. 
Universal termination requires the complete SLD-tree being finite [4] . While ex- 
istential termination is easy to observe, it turned out to be rather difficult to 
reason about. On the other hand, universal termination, while difficult to ob- 
serve, is much easier to treat formally. Further, universal termination is more 
robust to typical program changes that happen during program development. 
Universal termination is sensitive only to the computation rule but insensitive 
to clause selection. As has been pointed out by Pliimer [7] the conjunction of two 
universally terminating goals always terminates. Further, reordering and dupli- 
cating clauses has no influence. For this reasons, most research on termination 
focused on universal termination. We will consider universal termination with 
the leftmost computation rule, as used for Prolog programs. 



Example. The following example contains an erroneous data base causing uni- 
versal non-termination of the given query. Its non-termination cannot be easily 
observed by inspecting the sequence of produced solutions. Glancing over the first 
solutions suggests a correct implementation. But in fact, an infinite sequence of 
redundant solutions is produced. The failure-slice on the right, generated auto- 
matically by the presented method, locates the reason for non-termination by 
hiding all irrelevant parts. The remaining slice has to be changed in order to 
make the program terminating. 

The failure-slice helps significantly in understanding the program’s termina- 
tion property. It shows for example that clause reordering in ancestor_of/2 does 
not help here since this would lead to the same slice. Further it becomes evident, 
that the first rule in ancestor_of/2 is not responsible for termination. Often begin- 
ners have this incorrect belief confusing universal and existential termination. 
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% original program 
^ ancestor _of(Anc, leopoldj). 
child_of(karLVI, leopoldj). 
child_of(maria_theresia, karLVI) . 
child_of(joseph_II, maria_theresia) . 
child_of(leopold JI, maria_theresia). 
child_of(leopoldJlI, franzj). 
child_of(marie_a, maria_theresia) . 
child_of(franz J, leopoldJI). 

ancestor_of(Anc,Desc) ^ 
child_of(Desc,Anc). 
ancestor_of(Anc,Desc) <— 
child_of(Child, Anc), 
ancestor _of(Child, Desc). 



% failure-slice 

^ ancestor _of( Anc, leopoldj). 
cliild_of(kaiLVI, leupoldJ) ^ 
diild-uf(inaiiaJliLiLJa, kailAJ) 
child_of(jos e phJI, mariaJhor e Bia) ■ 
Jill J_uf(lLup uld JI, mmlaJliLi l J a) 
child_of(leopold JI, franzJ). 
child-of(maric-a, mariaJheresia) ^ 
child_of(franz J, leopoldJI). 

anccctor_of(Anc,Dc.c) fahc, 
Jiild_of(Drsr,Ar^ 
ancestor_of( Anc, Desc) <— 
child_of(Child, Anc), 
ancestor _of(Child, Desc), false. 



This example shows also some requirements for effectively producing failure- 
slices. On the one hand we need an analysis to identify the parts of a program 
responsible for non-termination. On the other hand, since such an analysis can 
only approximate the minimal slices, we need an efficient way to generate all 
slices which then are tested for termination by mere execution for a certain 
time. With the help of this combination of analysis and execution we often obtain 
explanations also when classical termination analysis cannot produce satisfying 
results. 



Contents. The central notions failure-slice and minimal explanation are pre- 
sented in Section 2. Some rules are given in Section 3 that must hold for minimal 
explanations. Section 4 presents our implementation. Finally we discuss how our 
approach is adapted to handle some aspects of full Prolog. A complete example is 
found in the appendix. We conclude by outlining further paths of development. 



2 Failure-Slices 

In the framework of the leftmost computation rule, the query ^ G terminates 
universally iff the query ^ G, false fails finitely. Transforming a program with 
respect to this query may result in a more explicit characterization of univer- 
sal termination. However, the current program transformation frameworks like 
fold/unfold are not able to reduce the responsible program size in a significant 
manner. We will therefore focus our attention towards approximations in the 
form of failure-slices. 

Definition 1 (Program point). The clause h ^ gi , ..., has a program point 
Pi on the leftmost side of the body and after each goal. A clause with n goals has 
therefore the following n+1 program points: h ^ pi, gipi+i , ..., gnPi+n ■ We label 
all program points of a program in some global order starting with the initial 
query. Program points in the query are defined analogously. We denote the set 
of all program points in program P with query Q by p{P, Q). 
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Definition 2 (Failure-slice). 

A program S is called a failure-slice of a program P with query Q if S eontains 
all clauses of P and the query Q with the goal “false” inserted at some program 
points. We represent a failure- slice by the subset of program points where “false” 
has not been inserted (i.e., where “true” has been inserted). 

The trivial failure-slice is p{P,Q), therefore the program itself. For a program 
with n program points there are \V{p{P,Q)) \ = 2” possible failure-slices. 

Example 1. For predicate list_invdiff/3 the set of program points p{P,Q) is the 
set of integers {0, 1, 2, 3, 4, 5}. On the right, the slice {0, 2, 4} is shown. 



^ /*P0*/ list_invdiff(Xs, [1,2,3], [])• % P5 
list_invdiff([], Xs, Xs). % PI 
list_invdiff([E|Es], XsO, Xs) ^ % P2 
list_invdiff(Es, XsO, Xsl), % P3 
Xsl = [EjXs]. % P4 



^ list_invdiff(Xs, [1,2,3], [[), false. 
list_invdiff([], Xs, Xs) <— false. 
list_invdiff([E[Es[, XsO, Xs) ^ 
list_invdiff(Es, XsO, Xsl), false, 
Xsl = [E[Xs]. 



Definition 3 (Partial order). A failure-slice S is smaller than T if S CT. 



Theorem 1. Let P be a definite program with query Q and let S and T be 
failure- slices of P,Q with S CT. If Q does not left-terminate in S then Q does 
not left-terminate in T. 

Proof. Consider the SLD-tree for the query Q in S. Since Q does not terminate, 
the SLD-tree is infinite. The SLD-tree for T contains all branches of S and 
therefore will also be infinite. □ 



Definition 4 (Sufficient explanation). A sufficient explanation E is a subset 
of V{p{P,Q)) such that for every non-terminating slice S ^ E, there is a non- 
terminating slice T £ E such that T C S. The trivial sufficient explanation is 
V{p{P,Q)). 



Example 2. A sufficient explanation of list_invdiff/3 is {{0,1}, {5}, {0,2}, {0,2,4}}. 
The slices {0,1} and {5} are terminating and therefore cannot help to explain 
non-termination. Slice {0,2,4} is a superset of {0,2}. Some other non-terminating 
slices are {0, 2, 3}, ..., {0, 1, 2}, ..., {0, 1, 2, 3,4, 5}. We note that there always exists 
a unique smallest sufficient explanation gathering all the minimal failure-slices. 



Definition 5 (Minimal explanation). The minimal explanation is the suffi- 
cient explanation with minimal cardinality. 
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The minimal explanation contains only non-terminating slices that form an 
anti-chain (i.e., that are not included in each other). In our example, {{0,2}} 
is the minimal explanation since all other non-terminating slices are supersets 
of {0,2}. 

The minimal explanation is an adequate explanation of non-termination, 
since it contains all minimal slices that imply the non-termination of the whole 
program with the given query. It helps to correct the program, because in all min- 
imal slices some parts highlighted by the minimal explanation must be changed 
in order to avoid non-termination. As long as the highlighted part remains com- 
pletely unchanged, the very same non-terminating failure-slice can be produced. 
Further, in our experience, minimal explanations are very small compared to 
the set of possible explanations. For example, the minimal explanation of the 
program in the appendix contains one out of 128 possible slices. 

Proposition 1. Q left-terminates w.r.t. P iff the minimal explanation of P,Q 
is empty. 

The undecidability of termination therefore immediately implies that mini- 
mal explanations cannot be determined in general. For this reason we approach 
the problem from two different directions. First, we focus on determining small 
slices. Second, we try to obtain a proof of (universal) non-termination for each 
slice in the explanation. If all slices are non-terminating, the minimal set has 
been calculated. 

Currently, we use a simple loop checker for proving universal non-termination 
that aborts signaling non-termination if a subsuming variant A of an atom A' 
that occurred in an earlier goal is considered. While this loop check may incor- 
rectly prune some solutions ([1], e.g., ex. 2.1.6), it is sufficient to prove universal 
non-termination . 

The first major obstacle when searching for a non-terminating failure slice 
is the large search space that has to be considered whereas the size of the min- 
imal explanation is typically very small. For a program with n points there are 
2" different slices, most of them being not interesting, either because they are 
terminating or because there is a smaller slice that describes the same properties. 

3 Failure Propagation 

In order to narrow down the set of potential slices, we formulate some criteria 
that must hold for slices in the minimal explanation. With the help of these 
rules, many slices are removed that can never be part of the minimal explanation. 
These rules are directly implemented, by imposing the corresponding constraints 
on the program points that are represented with boolean variables. 

Throughout the following rules we use the following names for program 
points. An entry /exit point of a predicate is a program point immediately be- 
fore/after a goal for that predicate in some clause body or the initial query. A 
beginning /ending point is the first/last program point in a clause. 
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Right-propagating rules 

Program points that will never be used do not occur in a slice of the minimal 

explanation. The following rules determine some of them. These rules encode 

the leftmost computation rule. 

Rl: Unused points. In a clause, a failing program point pi implies the next 
point pi+i to fail. 

R2: Unused predicates. If all entry points of a predicate fail, all corresponding 
beginning program points fail. 

R3: Unused points after failing definition. If in all clauses of a predicate the 
ending program points fail, then all corresponding exit points fail. 

R4: Unused points of recursive clauses. If in all clauses that do not contain 
a direct recursion the ending points fail, then all ending points fail. A 
predicate can only be true, if its definition contains at least one non 
recursive clause. 

Left-propagating rules 

Program points that are only part of a finite failure branch cannot be part 

of a slice in the minimal explanation. 

LI: Failing definitions. If all beginning program points of a predicate fail then 
all entry points fail. 

L2: Propagation over terminating goals. A failing program point Pi+i im- 
plies Pi to fail if gi+i terminates. Note that safe approximations of the 
termination of a goal are described below. 

L3: Left-propagation of failing exit points. If all exit points of a predicate 
except those after an tail-recursion fail, then all ending points fail. 

Local recursions 

Some infinite loops can be immediately detected by a clausewise inspection. 

Currently we consider only direct left recursions. 

Ml: Local left recursion. In a clause of the form h ^ g \, ..., gn, g, ... a failure 
is inserted after goal g, if for all substitution 9, g9 is unifiable with h, 
and the sequence of goals gi,...,gn can never fail. Also in this case it is 
ensured that the program never terminates. 



Example 3. In the following clause, rule Ml sets the program point after the 
recursive goal unconditionally to false. Thereby also the end point is set to false 
due to rule Rl. 

ancestor_of(Anc,Desc) ^ ancestor_of(Anc,Desc) <— 

ancestor _of(Child, Desc), ancestor _of(Child, Desc), false, 

child.of(Child, Anc). diild_uf(ClilId, Anc), faW 

A detailed example that shows the usage of the other rules is given in the ap- 
pendix. 



Propositiou 2 (Souuduess of propagating rules). If a slice is eliminated 
with the above rules, this slice does not occur in the minimal explanation. 
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Proof. For the right propagating rules R1-R4 it is evident that the mentioned 
program points will never be used with the leftmost computation rule. Therefore 
the program points may be either true or false, the minimal explanation therefore 
will contain false. 

The left propagating rules prune some finite failure branches. The minimal 
explanation will not contain these branches. The main idea is therefore to ensure 
that all infinite parts are preserved. 

LI: When all beginning program points fail, a finite failure branch is encoun- 
tered. By setting all entry points to false, only this finite branch is eliminated. 

L2: A terminating goal with a subsequent false generates a finite failure 
branch, which thus can be eliminated completely. 

L3: Consider first the simpler case, when all exit points of a predicate fail. 
In this case, all ending points may be true or false, without removing a branch. 
A minimal slice therefore will contain just false at these places. 

The ending point in a tail-recursive clause has no impact on the failure branch 
generated by the predicate, as long as all exit points are false. 

Ml: This rule describes a never terminating local left recursion. The minimal 
explanation may thus contain this loop. The subsequent points after the loop 
are therefore not needed in a minimal slice. □ 

While the presented rules can be used to generate a sufficient explanation, 
they are still too general to characterize a minimal explanation. In particular, 
since all rules except Ml do not take information about the arguments into 
account. 



Safe Approximation of Termination 

Rule L2 needs a safe approximation for the termination property. If the call 
graph of a (sliced) predicate does not contain cycles, the predicate will always 
terminate. For many simple programs (like perm/2 described in the annex), the 
minimal explanation can already be determined with this simple approximation 
that does not take the information about data flow into account. 

For many recursive predicates, however, this very coarse approximation leads 
to imprecise results, yielding a large sufficient explanation. We sketch the ap- 
proach we are currently evaluating to combine our constraint based termination 
analysis [9] with rule L2. We recall that the mentioned termination prover infers 
for each predicate p a boolean term Ct called its termination condition. If the 
boolean version of a goal <— p{f) entails Ct then universal left-termination of the 
original goal ^ p{t) is ensured. 

In order to apply rule L2 w.r.t. P, Q, we first tabulate [10] the boolean version 
of P, Q. Then, if all boolean call patterns for p entail Ct, rule L2 can be safely 
applied to such goals. 
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4 Implementation 

Our implementation uses finite domain constraints to encode the relations be- 
tween the program points. Every program point is represented by a boolean 0/1- 
variable where 0 means that the program point fails. In addition every predicate 
has a variable whose value indicates whether that predicate is always terminating 
or not. We refer to the appendix for a complete example. 



4.1 Encoding the Always- Terminating Property 

While it is possible to express cycles in finite domains directly, they are not 
efficiently reified in the current CLP(FD) implementation of SICStus-Prolog [3]. 
For this reason we use a separate pass for detecting goals that are part of a cycle. 
These goals are (as an approximation) not always-terminating. 

A predicate is now always-terminating if it contains only goals that are 
always-terminating. The encoding in finite domain constraints is straightfor- 
ward. Each predicate gets a variable AlwTerm. In the following example we 
assume that the predicates r/0 and s/0 do not form a cycle with q/0. So only 
q/ 0 forms a cycle with itself. 

q ^ (PO) r, (PI) s, (P2) q (P3). 

AlwTermQ ( ^PO V AlwTermR ) A ( ^P1 V AlwTermS) A ( ^P2 V 0 ) 

If a separate termination analysis is able to determine that q/0 terminates 
for all uses in P, Q, the value of AlwTermQ can already set accordingly. 



4.2 Failure Propagation 

The rules for minimal explanations can be encoded in a straightforward manner. 
For example rule R1 is encoded for predicate q/1 as follows: 

^PO ^ ^Pl, ^P1 ^ ^P2, ^P2 ^ -iP3. 

4.3 Labeling and Weighting 

Since we are interested in obtaining minimal explanations, we use the number 
of program points as a weight to ensure that the smallest slices are considered 
first. The most straightforward approach simply tries to maximize the number 
of failing program points. To further order slices with the same number of pro- 
gram points, we prefer those slices that contain a minimal number of predicates. 
Therefore we use three weights in the following order. 

1 . minimal number of program points that succeed 

2. minimal number of predicates 

3. minimal number of clauses 
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These weights lead naturally to an implementation in finite domain con- 
straints. By labeling these weights in the above order, we obtain minimal solu- 
tions first. Further only those solutions that are no extension to already found 
minimal slices are considered. 

4.4 Execution of Failure-Slices 

With the analysis so far we are already able to reduce the number of potentially 
non-terminating failure-slices. However, our analysis just as any termination 
analysis is only an approximation to the actual program behavior. Since failure- 
slices are executable we execute the remaining slices to detect potentially non- 
terminating slices. With the help of the built-in time_out/3 in SICStus Prolog 
a goal can be executed for a limited amount of time. In most situations the 
failure-slices will detect termination very quickly because the search space of the 
failure is significantly smaller than the original program. 

Instead of compiling every failure-slice for execution we use a single enhanced 
program — a generic failure-slice — which is able to emulate all failure-slices in 
an efficient manner. 

Generic failure- slice. All clauses of the program are mapped to clauses with a 
further auxiliary argument that holds a failure- vector, a structure with sufficient 
arity to hold all program points. At every program point n a goal arg(n,FVect,l) 
is inserted. This goal will succeed only if the corresponding argument of the 
failure- vector is equal to 1. 

p(...) <— slicep(...,FVect) <— 

arg(nl,FVect,l), 
g(...), sliceg(...,FVect), 

arg(n2,FVect,l), 

. . . , . . . , 

arg(ni,FVect,l), 
r(...). slicer(...,FVect), 

arg(ni-|-l,FVect,l). 



4.5 Proof of Non-termination 

Slices that are part of a minimal explanation must all be non terminating. To 
this end, we execute the slice with a simple loop checker under a timeout. Since 
our loop checker is significantly slower than direct execution, we use it as the 
last phase in our system. 

5 Full Prolog 

In this section we will extend the notion of failure-slices to full Prolog. To some 
extent this will reduce the usefulness of failure-slices for programs using impure 
features heavily. 
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5.1 Finite Domain Constraints 

Failure-slices are often very effective for CLP(FD) programs. Accidental loops 
often occur in parts that set up the constraints. For the programmer it is difficult 
to see whether the program loops or simply spends its time in labeling. Since 
labeling is usually guaranteed to terminate, removing the labeling from the pro- 
gram will uncover the actual error. No special treatment is currently performed 
for finite domain constraints. However, we remark that certain non-recursive 
queries effectively do not terminate in SICStus Prolog (or require a very large 
amount of time) like ^ S # >0, S # > T, T # > S. Examples like this cannot 
be detected with our current approach. If such goals appear in an non-recursive 
part of the program, they will not show up in a failure-slice. 



5.2 DCGs 



Definite clause grammars can be sliced in the same way as regular Prolog pred- 
icates. Instead of the goal false, the escape {false} is inserted. 



^ phrase(rnaloop, Bases), 
rnaloop — > 

{Bs = 

complseq(Bs), {false}, 









S- 

complseq([B|Bs]) — > 
complseq(Bs), {false}, 

^aoc_compl(C ^. 



: [falnc^ 

9- 

li„t([E|E,:]) ' 



._compl(0L\,0’Tf 

._compl(0’T,0’^ 

!_comp ) l(0'C,0‘= ^}= 

,_compl(0’G,(^ 



5.3 Moded Built-Ins 

Built-ins that can only be used in a certain mode like is/2 pose no problems, 
since failure-slices do not alter actual modes. 



5.4 Cut 

The cut operator is a heritage of the early years of logic programming. Its se- 
mantics prevents an effective analysis because for general usages cuts require to 
reason about existential termination. Existential termination may be expressed 
in terms of universal termination with the help of the cut operator. A goal G 
terminates existentially if the conjunction G, ! terminates universally. For this 
reason, goals in the scope of cuts and all the predicates the goal depends on 
must not be sliced at all. A simple cut at the level of the query therefore pre- 
vents slicing completely. 

Cl: Goals in front of cuts and all its depending predicates must not contain 
failure points. There must be no loop check in this part. 




338 



Ulrich Neumerkel and Fred Mesnard 



C2: In the last clause of a predicate, failure points can be inserted anywhere. By 
successively applying this rule, the slice of the program may be still reduced. 
C3: In all other clauses failures may only be inserted after all cuts. 

Notice that these restrictions primarily hinder analysis when using deep cuts. 
Using recommended shallow cuts [6] does not have such a negative impact. In 
the deriv-benchmark for example, there are only shallow cuts right after the 
head. Therefore, only program points after the cuts can be made to fail besides 
from clauses at the end. 
d(U+V,X,DU+DV) ^ 

d(U,X,DU), false, 

d(U \bX,DU D\0 faLc , 

i 

d(U,X,DU) , 

d(^VbDV) . 



5.5 Negation 

Similar to cuts Prolog’s unsound negation built-in \+/l “not” is handled. The 
goal occurring in the “not” and all the predicates it depends on must not contain 
any injected failures. Similarly “if-then-else” and if/3 are treated. The second 
order predicates setof/3 and findall/3 permit a more elaborate treatment. The 
program point directly after such goals is the same as the one within findall/3 
and setof/3. Therefore, failure may be propagated from right to left. 



5.6 Side effects 

Side effects must not be present in a failure-slice. However, this does not exclude 
the analysis of predicates with side effects completely. When built-ins only pro- 
duce side effects that cannot affect Prolog’s control (e.g. a simple write onto a 
log file provided that Prolog does not read that file, or reading something once 
from a constant file) still some failure-slice may be produced. Before such side 
effecting goals a failure is injected, therefore ensuring that the side effect is not 
part of the failure-slice. We note that the classification into harmless and harmful 
side effects relies on the operating system environment and is therefore beyond 
the scope of a programming language. 



6 Summary 

To summarize our approach, slicing proceeds in the following manner: 
1. The call graph is analyzed to detect goals that are part of a cycle. 
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2. A predicate fvectPQ_weights(FVect, Weights) is generated and compiled. It de- 
scribes the relation between the program points in P with respect to the 
query Q with the help of finite domain constraints. All program points are 
represented as variables in the failure-vector FVect. FVect therefore rep- 
resents the failure-slice. Weights is a list of values that are functions of the 
program points. Currently three weights are used: The number of predicates, 
the number of clauses and the number of succeeding program points. 

3. The generic failure-slice is generated and compiled. 

4. Now the following query is executed to find failure-slices. 

^ fvectPQ_weights(FVect, Weights), 

FVect =.. [_|Fs], 
labeling([], Weights), 
labeling(0, Fs), 

time_out(sliceP(3(...,FVect), t, time_out), 
loopingslicePQ (. . . ,FVect ,Result) . 

Procedurally the following happens: 

(a) fvectPQ_weights/2 imposes the constraints within FVect and Weights. 

(b) An assignment for the weights is searched, starting from minimal values. 

(c) An assignment for the program points in the failure vector is searched. 
A potential failure-slice is thus generated. 

(d) The failure-slice is actually run for a limited amount of time to discard 
some terminating slices. 

(e) The loop checker is uses to determine if non-termination can be proven. 
The analysis thus executes on the fly while searching for failure-slices. 

7 Conclusion and Future Work 

We presented a slicing approach for termination that combines both static and 
dynamic techniques. For the static analysis we used finite domain constraints 
which turned out to be an effective tool for our task. Usual static analysis con- 
siders a single given program. By using constraints we were able to consider a 
large set of programs at the same time, thereby reducing the inherent search 
space considerably. Since failure-slices are executable their execution helps to 
discard terminating slices. 

Tighter integration of termination proofs. Our approach might be further refined 
by termination proofs. In principle, any system for proving termination could 
be integrated in our system to test whether a particular slice terminates. In 
this manner some more terminating slices can be eliminated from a sufficient 
explanation. There are however several obstacles to such an approach. First, 
most termination proofs are rather costly, in particular, when a large set of 
slices is detected as terminating. We consider using a constraint based approach 
as presented in [9] that will be parameterized by the program points. We expect a 
significantly more efficient implementation than those that tests for termination 
at the latest possible moment. On the other hand, the precision of the analysis 
should not suffer from this generalization. 
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Stronger rules constraining sujficient explanations. In another direction we are 
investigating to formulate strong rules to constrain the search space of sufficient 
explanations. In particular in programs as ancestor_of/2, there is still an expo- 
nential number of slices that must be tested dynamically. We envisage the usage 
of dependencies between alternate clauses to overcome this problem. 

Argument slicing. The existing slicing approaches [17,5,14] all perform argument 
slicing. We have currently no implementation of argument slicing. While it im- 
proves program understanding, argument slicing does not seem to be helpful for 
further reducing the number of clauses or program points. It appears preferable 
to perform argument slicing after a failure-slice has been found. 

Acknowledgments. The initial work on failure-slices was done within INTAS 
project INTAS-93-1702. 
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A Failure-Slices for perm/2 



% Original program 
perm([], []). % PI 
perm(Xs, [X|Ys]) ^ % P2 
del(X, Xs, Zs), % P3 
perm(Zs, Ys). % P4 

del(X, [X|Xs], Xs). % P5 
del(X, [Y|Ys], [Y|Xs]) ^ % P6 
del(X, Ys, Xs). % P7 



% First failure-slice {2,6} 
prrm([], []) f ^. 

perm(Xs, [X|Ys]) 
del(X, Xs, Zs), false, 
pnrm(Zt;, Yr)^ ^. 

drl(X, [X | Xd, Xu) falu. 
del(X, [Y|Ys], [Y|Xs]) ^ 
del(X, Ys, Xs), false. 



% 2nd failure-slice {2,3,5} 
p.rm([], []) f ^. 

perm(Xs, [X|Ys]) •«— 
del(X, Xs, Zs), 
perm(Zs, Ys), false. 

del(X, [X|Xs], Xs). 
drl(X, [Y | Yd, [Y | Xu]) f, 
dcl(X, Yt,, Xe) ^^. 



•< — perm(Xs, [1,2]). % PO, P8 •«— perm(Xs, [1,2]), false. •«— perm(Xs, [1,2]), false. 

% does not terminate % terminates 

% Determination of failure-slices to be tested for termination/non-termination 
^ fvectPQ_weights(FVect,Wghts), FVect=..[_|Ps], labeling([],Wghts), labeling([],Ps). 
% FVect = s(0, 1,0, 0,0, 1,0,0), Wghts = [3,2,2]. % {2,6} does not terminate 

% FVect = s(0, 1,1, 0,1, 0,0,0), Wghts = [4,2,2]. % {2,3,5} terminates; deleted 

% FVect = s(0, 1,1, 0,1, 1,0,0), Wghts = [5,2,3]. % {2, 3, 5, 6} D {2,6}; not considered 

% FVect = s(0, 1,1, 0,1, 1,1,0), Wghts = [6,2,3]. % {2, 3, 5, 6, 7} D {2,6}; not considered 

% => The minimal explanation E = {{2,6}} 



% Definition of the failure- vector 

fvectPQ_weights(s(Pl, P2, P3, P4, P5, P6, P7), [NPoints, NPreds, NClauses]) <— 
domain_zs(0..1, [PI, P2, P3, P4, P5, P6, P7[), 

PO = 1, P8 = 0, % Given Query 



% Rl: unused points in clause 

mP2 ^ mP3, ^P3 ^ ^P4, ^P6 ^ mP7, 

% R2: unused predicates 

/*perm/2:*/ mPO A mP3 ^ mPl A mP2, /*del/3:*/ mP2 A mP6 ^ mP5 A mP6, 
% R3: failing definition 

/*perm/2:*/ -.Pl A mP4 ^ mP8 A mP4, /*del/3:*/ mP5 A ~.P7 ^ mP3 A mP7, 
% R4: right propagation into recursive clause 
/*perm/2:*/ mPl ^ mP4, /*del/3:*/ mP5 ^ mP7, 

% LI: failing definition 

/*perm/2:*/ -.Pl A mP2 ^ mP3 A mPO, /*del/3:*/ mP5 A “.P6 ^ ^P2 A ^P7, 
% L2: over (always) terminating goals 

mP4 A AlwTermPerm ^P3, ^P8 A AlwTermPerm => ^PO, 
mP3 A AlwTermDel => mP2, mP7 A AlwTermDel => ^P6, 

% L3: failing exit points 

mP8 => -.Pl A -.P4, mP3 ^ mP5 A ~<P7 , 



% Always terminating 

AlwTermPerm (-.P2V AlwTermDel) A (mP3V0), AlwTermDel (mP6V0), 

% Weights: 

NPreds #= min(l,Pl-|-P2) + min(l,P5-|-P6), 

NClauses #= P1-I-P2-I-P5-I-P6, 

NPoints #= P0+Pl+P2-bP3-fP4-bP5-bP6-fP7+P8. 
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Abstract. Tabling avoids many of the shortcomings of SLD(NF) exe- 
cution and provides a more flexible and efficient execution mechanism 
for logic programs. In particular, tabled execution of logic programs ter- 
minates more often than execution based on SLD-resolution. One of the 
few works studying termination under a tabled execution mechanism is 
that of Decorte et al. They introduce and characterise two notions of 
universal termination of logic programs w.r.t. sets of queries executed 
under SLG-resolution, using the left-to-right selection rule; namely the 
notion of quasi-termination and the (stronger) notion of LG-termination. 
This paper extends the results of Decorte et al in two ways: (1) we 
consider a mix of tabled and Prolog execution, and (2) besides a charac- 
terisation of the two notions of universal termination under such a mixed 
execution, we also give modular termination conditions. From both prac- 
tical and efficiency considerations, it is important to allow tabled and 
non-tabled predicates to be freely intermixed. This motivates the first 
extension. Goncerning the second extension, it was already noted in the 
literature in the context of termination under SLD-resolution (by e.g. Apt 
and Pedreschi), that it is important for programming in the large to have 
modular termination proofs, i.e. proofs that are capable of combining 
termination proofs of separate programs to obtain termination proofs of 
combined programs. 



1 Introduction 

The extension of SLD-resolution with a tabling mechanism [4,15,18], avoids many 
of the shortcomings of SLD(NF) execution and provides a more flexible and often 
considerably more efficient execution mechanism for logic programs. In particu- 
lar, tabled execution terminates more often than execution based on SLD. So, if 
a program and query can be proven terminating under SLD-resolution (by one 
of the existing techniques surveyed in [5]), then they will also trivially terminate 
under SLG-resolution, the resolution principle of tabulation [4]. However, since 
there are programs and queries which terminate under SLG-resolution and not 

* This research was conducted during the author’s stay at the K.U. Leuven, Belgium. 

G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 342-359, 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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under SLD-resolution, more effective proof techniques can be found. This paper 
is one of the few works studying termination of tabled logic programs. 

We base our approach on the work of Decorte et al [8]. There, two notions of 
universal termination of a tabled logic program w.r.t. a set of queries executed 
under SLG-resolution using the left-to-right selection rule (called LG-resolution 
in the sequel) are introduced and characterised. Namely, the notion of quasi- 
termination and the (stronger) notion of LG-termination. We extend the results 
of [8] in two ways: (1) we consider a mix of LG-resolution and LD-resolution, i.e. 
a mix of tabled and Prolog execution, and (2) besides a characterisation of the 
two notions of universal termination under such a mixed execution schema, we 
also give modular termination conditions, i.e. conditions on two programs P and 
R, where P extends R, ensuring termination of the union PUR. The motivation 
for extension (1) will be given in the next section. There, examples from context- 
free grammar recognition and parsing are given, which show that, from the point 
of view of efficiency, it is important to allow tabled and non-tabled predicates 
to be freely intermixed. Extension (2) was already motivated in the literature in 
the context of termination under SLD-resolution (see for instance [3]). Indeed, it 
is important for programming in the large, to have modular termination proofs, 
i.e. proofs that are capable of combining termination proofs of separate programs 
to obtain termination proofs of combined programs. 

The rest of the paper is structured as follows: In the next section we present 
examples which motivate the need to freely mix tabled and Prolog execution. In 
Section 3, we introduce some necessary concepts and present a definition of SLG- 
resolution. Section 4 introduces a first notion of termination under tabled evalu- 
ation: quasi-termination. In addition, a characterisation for quasi-termination is 
given which generalises the characterisation given in [8] to the case where Prolog 
and tabled execution are intermixed. Then, a modular termination condition is 
given for the quasi-termination of the union P U R oi two programs P and R, 
where P extends R. In Section 5, the stronger notion of LG-termination is 
defined and characterized, and a method for obtaining modular proofs for LG- 
termination is presented. We conclude with discussing related and future work. 
We refer to the full version of the paper, [17], for more results and examples and 
for the proofs of all the theorems, propositions and statements made here. 

2 Mixing Tabled and Prolog Execntion: Some Motivation 

It has long been noted in the literature that tabled evaluation can be used for 
context-free grammar recognition and parsing: tabling eliminates redundancy 
and handles grammars that would otherwise infinitely loop under Prolog-style 
execution (e.g. left recursive ones). The program of Fig. 1 where all predicates 
are tabled, provides such an example. This grammar, recognizing arithmetic 
expressions containing additions and multiplications over the integers, is left 
recursive — left recursion is used to give the arithmetic operators their proper 
associativity — and would be non-terminating for Prolog-style execution. Under 
tabled execution, left recursion is handled correctly. In fact, one only needs to 
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expr{Si, So) 
expr{Si, So) 
term{Si, So) 
term(Si, So) 
primary{Si, So) 
primary{Si, So) 



expr(Si, SI), SI = [' +'\S2],term{S2, So) 
term{Si, So) 

term{Si,Sl),Sl = [*'\S2],primary{S2,So) 
primary{Si, So) 

Si = ['('|S'l],ea;pr(5'l,5'2),5'2 = [')'|5'o] 

Si = {I\So],integer{I) 



Fig. 1. A tabled program recognizing simple arithmetic expressions. 



table predicates exprj2 and term j 2 to get the desired termination behaviour; 
we can and will safely drop the tabling of primary /2 in the sequel. 



To see why a mix of tabled with Prolog execution is desirable in practice, 
suppose that we want to extend the above recognition grammar to handle ex- 
ponentiation. The most natural way to do so is to introduce a new nonterminal, 
named factor, for handling exponentiation and make it right recursive, since the 
exponentiation operator is right associative. The resulting grammar is as below 
where only the predicates exprf2 and term/2 are tabled. Note that, at least 



expr(Si, So) 
expr{Si, So) 
term{Si, So) 
term{Si, So) 
factor{Si, So) 
factor{Si, So) 
primary{Si, So) 
primary{Si, So) 



expr{Si,Sl),Sl = ['+'\S2],term{S2, So) 
term{Si, So) 

term{Si,Sl),Sl = ['*'\S2], factor {S2, So) 
factor{Si, So) 

primary{Si,Sl), SI = [' A'\S2], factor{S2, So) 
primary{Si, So) 

Si = expr{Sl, S2), S2 = [O'lSo] 

Si = [I\So], integer (I) 



as far as termination is concerned, there is no need to table the new nontermi- 
nal. Indeed, Prolog’s evaluation strategy handles right recursion in grammars 
finitely. In fact. Prolog-style evaluation of right recursion is more efficient than 
its tabled-based evaluation: Prolog has linear complexity for a simple right recur- 
sive grammar, but with tabling implemented as in XSB the evaluation could be 
quadratic as calls need to be recorded in the tables using explicit copying. Thus, 
it is important to allow tabled and non-tabled predicates to be freely intermixed, 
and be able to choose the strategy that is most efficient for the situation at hand. 

By using tabling in context-free grammars, one gets a recognition algorithm 
that is a variant of Early’s algorithm (also known as active chart recognition 
algorithm) whose complexity is polynomial in the size of the input expres- 
sion/string [9]. However, often one wants to construct the parse tree(s) for a 
given input string. The usual approach is to introduce an extra argument to 
the nonterminals of the input grammar — representing the portion of the parse 
tree that each rule generates — and naturally to also add the necessary code 
that constructs the parse tree. This approach is straightforward, but as noticed 
in [19], using the same program for recognition as well as parsing may be ex- 
tremely unsatisfactory from a complexity standpoint: in context-free grammars 
recognition is polynomial while parsing is exponential since there can be ex- 
ponentially many parse trees for a given input string. The obvious solution is 
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r s{Si, So) ^ a{Si, S),S= [felS'o] 

R : So) ^ a{Si, S), a{S, So) 

\ a[Si, So) ^ Si = [alSo] 

r s{Si, So, PT) ^ a{Si, S), S = [6|5o], PT = spt{Pa, b), a{Si, S, Pa) 

P : a\si, So, PT) ^ a\si, S), a{S, So), PT = apt{P, P'), a{Si, S, P),a{S, So, P') 

\ a{Si, So, PT) ^ Si = [alS'o], PT = a 

Fig. 2. A tabled program recognizing and parsing the language a^b. 

to use two interleaved versions of the grammar as in the example program of 
Fig. 2. Note that only a/2, i.e. the recursive predicate of the ‘recognition’ part, 
R, of the program (consisting of predicates s/2 and a/2), needs to be tabled. 
This action allows recognition to terminate and to have polynomial complexity. 
Furthermore, the recognizer can now be used as a filter for the parsing process in 
the following way: only after knowing that a particular part of the input belongs 
to the grammar and having computed the exact substring that each nonterminal 
spans, do we invoke the parsing routine on the nonterminal to construct its (pos- 
sibly exponentially many) parse trees. Doing so, avoids e.g. cases where it may 
take exponential time to fail on an input string that does not belong in the given 
language: an example for the grammar under consideration is the input string 
a". On the other hand, tabling the ‘parsing’ part of the program (consisting of 
predicates s/3 and a/3) does not affect the efficiency of the process complexity- 
wise and incurs a small performance overhead due to the recording of calls and 
their results in the tables. Finally, note that the construction is modular in the 
sense that the ‘parsing’ part of the program, P, depends on the ‘recognition’ 
part, R, but not vice versa — we say that P extends R. 



3 Preliminaries 

We assume familiarity with the basic concepts of logic programming [13,1]. 
Throughout the paper, P will denote a definite logic program and we restrict 
ourselves in this class. By Predp, we denote the set of predicates occurring in P, 
and by Defp we denote the set of predicates defined in P (i.e. predicates occur- 
ring in the head of a clause of P). By Recp, resp. NRecp, we denote the set of 
(directly or indirectly) recursive, resp. non-recursive, predicates of the program 
P (so NRecp = Predp \ Recp). If A = p{ti, . . . , t„), then we denote by Rel{A) 
the predicate symbol p of A; i.e. Rel{A) = p. 

The extended Herbrand Universe, Up, and the extended Herbrand Base, Bp, 
associated with a program P, were introduced in [10]. They are defined as fol- 
lows. Let Terrrip and Atorrip denote the set of respectively all terms and atoms 
that can be constructed from the alphabet underlying P. The variant relation, 
denoted «, defines an equivalence. Up and Bf are respectively the quotient sets 
Termp/ w and Atomp/ «. For any term t (or atom A), we denote its class in 
Up (Bp) as t (A). However, when no confusion is possible, we omit the tildes. 
If 77 C Predp, we denote with Bp the subset of Bp consisting of (equivalence 
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classes of) atoms based on the predicate symbols of II. So Bp can be seen as 
an abbreviation of . 

Let P be a program and p, q two predicate symbols of P. We say that p 
refers to q in P iff there is a clause in P with p in the head and q occurring 
in the body. We say that p depends on q in P, and write p □ g, iff (p, q) is in 
the reflexive, transitive closure of the relation refers to. Note that, by definition, 
each predicate depends on itself. We write p~giffp3g,(73p(p and q are 
mutually recursive or p = q). The dependency graph G of a program P is a graph 
where the nodes are labeled with the predicates of Predp. There is a directed arc 
from p to q in G iff p refers to q. Finally, we will say that a program P extends 
a program R iff no predicate defined in P occurs in R. 

In analogy to [2], we will refer to SLD-derivations (see [13]) following the left- 
to-right selection rule as LD-derivations. Other concepts will adopt this naming 
accordingly. For a program P and set S C Bp, we denote by Call{P,S) the 
subset of Bp such that B G Call{P,S) whenever an element of P is a selected 
atom in an LD-derivation for some P U A}, with A G S. Throughout, we 
will assume that in any derivation of a query w.r.t. a program, representants 
of equivalence classes are systematically provided with fresh variables, to avoid 
the necessity of renaming apart. In the sequel, we abbreviate computed answer 
substitution with c.a.s. Our termination conditions are based on the following 
concept of finitely partitioning level mapping. 

Definition 1. ((finitely partitioning) level mapping) 

Let P he a program and L C Bp. A level mapping on L is a function O L — > IM. 
A level mapping I on L is finitely partitioning on C C L iff for all n G : 
h{l~^{n) n G) < 00, where f] is the cardinality function. 

3.1 SLG-Resolution 

We present a non-constructive definition of SLG-resolution that is sufficient for 
our purposes, and refer to [4,15] for more constructive formulations of (variants) 
of tabled resolution. 

By fixing a tabling for a program P, we mean choosing a set of predicates 
of P which are tabled. We denote this tabling as Tabp. The complement of this 
set of tabled predicates is denoted as NTabp = Predp \ Tabp. 

Definition 2. (pseudo SLG-tree, pseudo LG-tree) Let P be a definite pro- 
gram, Tabp C Predp, R a selection rule and A an atom. A pseudo SLG-tree 
w.r.t. Tabp for P U {<— A} under R is a tree ta such that: 

1. the nodes of ta are labeled with goals along with an indication of the selected 
atom according to R, 

2. the arcs are labeled with substitutions, 

3. the root of ta is <— A, 

4 . the children of the root <— A are obtained by resolution against all matching 
program clauses in P , the arcs are labeled with the corresponding mgu used 
in the resolution step. 
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5. the children of a non-root node labeled with the goal Q where 7?.(Q) = B are 
obtained as follows: 

(a) if Rel{B) G Tabp, then the (possibly infinitely many) children of the 
node can only be obtained by resolving the selected atom B of the node 
with clauses of the form BO <— (not necessarily in P), the arcs are labeled 
with the corresponding mgu used in the resolution step (i.e. 9), 

(b) if Rel(B) G NTabp, then the children of the node are obtained by reso- 
lution of B against all matching program clauses in P, and the arcs are 
labeled with the corresponding mgu used in the resolution step. 

IfTZ is the leftmost selection rule, ta is called a pseudo LG-tree w.r.t. Tabp for 
PU{^A}. 

We say that a pseudo SLG-tree ta w.r.t. Tabp for P U {<— A\ is smaller than 
another pseudo SLG-tree w.r.t. Tabp for P U A} iff can be obtained 
from TA by attaching new sub-branches to nodes in ta. 

A (computed) answer clause of a pseudo SLG-tree ta w.r.t. Tabp for PU{<— A} 
is a clause of the form AO <— where 0 is the composition of the substitutions found 
on a branch of ta whose leaf is labeled with the empty goal. 

Intuitively, a pseudo SLG-tree (in an SLG-forest, see Definition 3 below) 
represents the tabled computation (w.r.t. Tabp) of all answers for a given subgoal 
labeling the root node of the tree. The trees in the above definition are called 
pseudo SLG-trees because there is no condition yet on which clauses BO <— 
exactly are to be used for resolution in point 5a. These clauses represent the 
answers found (possibly in another tree of the forest) for the selected tabled 
atom. This interaction between the trees in an SLG-forest is captured in the 
following definition. 

Definition 3. (SLG-forest, LG-forest) Let P be a definite program, T abp C 
Predp, TZ be a selection rule and T be a (possibly infinite) set of atoms such 
that no two different atoms in T are variants of each other. F is an SLG-forest 
w.r.t. Tabp for P and T under TZ iff F is a set of minimal pseudo SLG-trees 
{ta \ a gT{ w.r.t. Tabp where 

1. Ta is a pseudo SLG-tree w.r.t. Tabp for P\J A} under TZ, 

2. every selected tabled atom B of each node in every ta G F is a variant of 
an element B of T, such that every clause resolved with B is a variant of 
an answer clause of t^i and vice versa, for every answer clause of t^/ there 
is a variant of this answer clause which is resolved with B. 

Let S be a set of atoms. An SLG-forest for P and S w.r.t. Tabp under TZ is an 
SLG-forest w.r.t. Tabp for a minimal set T with S C T. Lf S = {A}, then we 
also talk about the SLG-forest for P U {<— A{. 

An LG-forest is an SLG-forest containing only pseudo LG-trees. 

Point 2 of Definition 3, together with the imposed minimality of trees in 
a forest, now uniquely determines these trees. So we can drop the designation 
“pseudo” and refer to (S)LG-trees in an (S)LG-forest. 
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Note that, if Tabp = 0, the (S)LG-forest of P U A} consists of one tree: 
the (S)LD-tree of P U {<— A}. We use the following artificial tabled program to 
illustrate the concepts that we introduced. 

Example 1. Let P be the following program, with Tabp = {member/2}. Let 

intersection{Xs,Ys, Z) <— member {X s, Z), member {Ys, Z) 
member{[Z\Zs], Z) <— 
member{[X\Zs], Z) member {Zs, Z) 

S = {inters ection{Xs, Ys, a)}. Then, Call{P,S) = S' U {member {Zs, a)} and 
the LG-forest for P and S is shown in Fig. 3. Note that there is a finite number of 
LG-trees, all with finite branches, but the trees have infinitely branching nodes. 



intersection(Xs,Ys,a) member(Xs,a) 




member(Ys,a) member(Ys,a) 




Fig. 3. The LG-forest for PU intersection{Xs,Ys,a)}. 

Note that we can use the notions of LD-derivation and LD-computation (as 
they appear for instance in the definition of the call set Call{P, S)) even in the 
context of SLG-resolution, as the set of call patterns and the set of computed 
answer substitutions are not influenced by tabling; see e.g. [12, Theorem 2.1]. 

4 Quasi-Termination 

A first basic notion of (universal) termination under a tabled execution mecha- 
nism is quasi-termination. It is defined as follows (see [8, Definition 3.1] for the 
case Tabp = Predp). 

Definition 4. (quasi-termination) Let P be a program, Tabp C Predp and 

5 C Bp. P quasi-terminates w.r.t. Tabp and S iff for all A such that A G S, 
the LG-forest w.r.t. Tabp for P U {<— A} consists of a finite number of LG- 
trees without infinite branches. Also, P quasi-terminates w.r.t. S iff P quasi- 
terminates w.r.t. Predp and S. 

Note that it is not required that the LG-trees are finitely branching in their 
nodes. In the next section, we introduce and provide conditions for the stronger 
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notion of LG -termination which requires that the LG-forest consists of a fi- 
nite number of finite trees (i.e. trees with finite branches and which are finitely 
branching) . Recall the program P and set S of Example 1 . P quasi-terminates 
w.r.t. {mem6er/2} and S. Note that P doesn’t LG-terminate w.r.t. {member /2} 
and S. 

Many works (see [5] for a survey) address the problem of LD-termination: A 
program P is said to be LD-terminating w.r.t. a set S C Bp iff for all A such 
that A S 5”, the LD-tree of P U {<— A} is finite (see for instance [6]). It can be 
easily shown that, if P LD-terminates w.r.t. S, then P quasi-terminates w.r.t. 
Tabp and S (for every Tabp Predp). As shown in Example 1, the notion of 
LD-termination is strictly stronger than the notion of quasi-termination. 

Since quasi-termination requires that there are only finitely many LG-trees 
in the LG-forest of a query, there can only be a finite number of tabled atoms in 
the call set of that query. More formally: If a program P quasi-terminates w.r.t. 
Tabp and S, then, for every A G S, Call{P, {A}) n P^atip fi™te. 

In [8], the special case where Tabp = Predp, i.e. where all predicates occur- 
ring in P are tabled, is considered. If Tabp = Predp, an LG-tree cannot have 
infinite branches. So, P quasi-terminates w.r.t. a set S iff for all A such that 
A G S, the LG-forest for P U {<— A} consists of a finite number of LG-trees. In 
[8, Lemma 3.1] the following equivalence was proven: P quasi-terminates w.r.t. 
S iff for every A G S, Call{P, {A}) is finite. We want to note that none of the 
directions (if nor only-if) hold in case that the tabled predicates of the program 
P are a strict subset of Predp. We refer to [17] for two counterexamples, one 
for each direction. 

4.1 Characterisation of Quasi- Termination 

In order to state a necessary and sufficient condition for quasi-termination, we 
need to make an assumption on the set Tabp of tabled predicates of the pro- 
gram P. If the assumption is not satisfied, the condition is sufficient (but not 
necessary) . 

Definition 5. (well-chosen tabling (w.r.t. a program)) Let P be a pro- 
gram, Predp = Tabp Li NTabp and G be the predicate dependency graph of 
P. The tabling Tabp is called well-chosen w.r.t. the program P iff for every 
p,q G NTabp such that p ~ q, exactly one of the following two conditions holds: 

Ci{p, q): no cycle of directed arcs in G containing p and q contains a predicate 
from Tabp. 

(72 (p, q): all cycles of directed arcs in G containing p and q contain at least one 
predicate from Tabp. 

In particular, if NTabp Q {p G Predp j p is a non-recursive or only directly 
recursive predicate} or if NTabp = 0 (i.e. Tabp = Predp), then the tabling 
Tabp is well-chosen w.r.t. P. 

The next theorem provides a necessary and sufficient condition for quasi- 
termination of a program P w.r.t. a tabling and a set of atoms, in case the 
tabling is well-chosen w.r.t. P. 
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Theorem 1. (characterisation of quasi-termination in case the tabling 
is well-chosen) Let P be a program, Tabp C Predp and S C Bp. Suppose the 
tabling Tabp is well-chosen w.r.t. P. Then, P quasi-terminates w.r.t. Tabp and 
S iff there is a level mapping I on Bp such that for every A G S, I is finitely 
partitioning on Call{P,{A}) n and such that 

— for every atom A such that A G Call{P, S), 

— for every clause H ^ B\, , Bn in P, such that mgu{A, H) = 0 exists, 

— for every 1 <i <n and for every LD-c.a.s. 9i-\ for ^ {Bi, . . . , Bi-i)0: 



i{A) > i{Bm-i) 

and 

1{A) > l{B,96,-i) if Rel{A) ~ Rel{Bf) G NTabp 

and C 2 {Rel{A),Rel{Bi)) does not hold. 



Note that [8, Theorem 3.1] is an instance of this theorem, with Tabp = 
Predp. We illustrate the intuition behind Theorem 1 with the following example. 

Example 2. Consider the following three propositional programs P, P and P : 




with Tabp = Tabp' = Tabp" = {p} and S = {p}. The LG-forests for PU{^ p}, 
P U{^ p} and P U{<— p} are shown in Fig. 4. P and P do not quasi-terminate 




Fig. 4. The LG-forests for P U p}, P U p}, and P U p}. 
w.r.t. {p}, whereas P does. 

Note that for programs P and P , the tablings are well-chosen. Also note that, 
because the programs are propositional, every level mapping will be finitely par- 
titioning on the whole Herbrand base. 

Let’s first consider program P. For this program condition Ci{q,r) holds. Also 
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note that there is no level mapping I such that l{q) > l{r) and l{r) > l{q). Hence, 
the condition in Theorem 1 can not be satisfied and P does not quasi-terminate 
w.r.t. {p}. 

Consider next program P for which condition C 2 {q, r) holds. Let I be the follow- 
ing level mapping l{p) = l{q) = l{r) = 0. With this level mapping, P satisfies 
the condition of Theorem 1 and hence, P quasi-terminates w.r.t. {p}. 

Finally, note that for the program P , the tabling is not well-chosen w.r.t. P . 

In case the tabling for a program is not well-chosen (like for the program P 
of Example 2), the condition of Theorem 1 is sufficient (but not necessary) for 
proving quasi-termination. 



4.2 Modular Proofs for Quasi- Termination 

We now present a proposition which gives a modular proof for the quasi-termina- 
tion of the union P U i? of two programs P and R, such that P extends R. If 
PredpuR = Tabpup U NTabpup, let 

Tabp = Tabpyjp n Predp , NTabp = NTabpup H Predp, 

Tabp = Tabpup C Predp , NTabp = NTabpup n Predp. 



Proposition 1. Suppose P and R are two programs, such that P extends R. 
Let S C If 

— R quasi-terminates w.r.t. Tabp and Call{P U R, S), 

— there is a level mapping I on Bp such that for every A £ S , I is finitely 
partitioning on Call{PU i?, {H}) n Bpabp’ such that 

• for every atom A such that A G Call{PU R, S), 

• for every clause H <— i?i, . . . , in P such that mgu{A, H) = 9 exists, 

• for every 1 < i < n and for every LD-c.a.s. 0i_i in P U R for <— 

1{A) > l{Bm-i) 

and 

1{A) > l{B,ee,_i) ifRel{A) ~ Rel{Bi) G NTabp 

and C 2 {Rel{A),Rel{Bi)) does not hold. 

then, PUR quasi-terminates w.r.t. Tabp^p and S. 

The program and query of Example 1 can be proven to quasi-terminate by 
applying Proposition 1 (see [17]). In [17] we give three more propositions for 
proving quasi-termination in a modular way. They all consider special cases of 
Proposition 1: in the first of these propositions, no defined predicate of P is 
tabled, in the second one all the defined predicates in P are tabled, and in the 
last one, P and R extend each other. 
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Note that the above modular termination conditions prove the quasi-termina- 
tion of P U i? without constructing an appropriate level mapping which satis- 
fies the condition of Theorem 1. We refer to [17] where modular termination 
conditions for quasi-termination are given which construct (from simpler level 
mappings) a level mapping such that PUR and this level mapping satisfy the 
condition for quasi-termination of Theorem 1 . 

5 LG-Termination 

The notion of quasi-termination only partially corresponds to our intuitive notion 
of a terminating execution of a query against a tabled program. This notion only 
requires that the LG-forest consists of only a finite number of LG-trees, without 
infinite branches, yet these trees can have infinitely branching nodes. To capture 
this source of non-termination for a tabled computation, the following stronger 
notion is introduced (see [8, Def. 4.1] for the special case where Tabp = Predp). 

Definition 6 . (LG-termination) Let P be a program, Tabp C Predp and 
S C Bp. P LG-terminates w.r.t. Tabp and S ijf for every atom A sueh that 
A G S , the LG-forest w.r.t. Tabp for P U {<— A} eonsists of a finite number of 
finite LG-trees. 

As already noted, the program P of Example 1 does not LG-terminate 
w.r.t. {member / 2 } and {intersection{X s,Ys, a)} . Obviously, the notion of LG- 
termination is (strictly) stronger than the notion of quasi-termination. Also, 
LD-termination implies LG-termination. 

Gonsider two tablings for a program P; one with set of tabled predicates equal 
to Tabi C Predp, the other with set of tabled predicates equal to Ta &2 O Predp. 
Suppose Tabi C Ta &2 (hence NTabi A NTab2). The next proposition studies 
the relationship between the LG-termination of P w.r.t. these two tablings. We 
note that it does not hold for quasi-termination; see [17] for a counterexample. 

Proposition 2 . Let P he a program. Let Predp = TabiU NTabi and Predp = 
Tat»2 LI NTab2. Suppose Tabi Tab2. Let S C Bp. Lf P LG-terminates w.r.t. 
Tabi and S, then P LG-terminates w.r.t. Ta&2 and S. 

We now relate the notions of quasi-termination and LG-termination in a 
more detailed way: By definition, quasi-termination only corresponds to part 
of the LG-termination notion; it fails to capture non-termination caused by an 
infinitely branching node in an LG-tree. Note that if an LG-forest contains a tree 
with an infinitely branching node, then there is an LG-tree in the forest which 
is infinitely branching in a node which contains a goal with a recursive, tabled 
atom at the leftmost position. This observation leads to the following lemma. 
Let us denote the set of tabled, recursive predicates in a program P with TRp: 



TRp = Tabp n Recp. 
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Lemma 1. Let P be a program, Tabp C Predp and S C Bp. P LG-terminates 
w.r.t. Tabp and S iff P quasi-terminates w.r.t. Tabp and S and for all A € S 
the set of (LD-)computed answers for atoms in Call{P, {A}) H Bppp is finite. 

It follows from the proof of this lemma (see [17]) that, if P LG-terminates 
w.r.t. Tabp and S, the set of computed answers for atoms in Call{P, {A}) is 
finite for all A G S. We now present a characterisation of LG-termination. 

5.1 Characterisation of LG-Termination 

First (in Theorem 2), we will characterise LG-termination of a program P in 
terms of quasi-termination of the program which is obtained by applying 
the answer-transformation (Definition 7) on P. However, we will also characterise 
LG-termination in a more direct way (Theorem 3). 

Lemma 1 gives the intuition behind the answer-transformation that we are 
about to present. The answer-transformation forms the basis of the characteri- 
sation of LG-termination in Theorem 2; LG-termination of a program P will be 
shown to be equivalent with quasi-termination of the program P“, obtained by 
applying the answer-transformation to P. 

Definition 7. (a(nswer)-transformation) Let P be a program with tabling 
Tabp Predp. The a-transformation is defined as follows: 

— For a clause C = FT <— Pi, . . . , P„ in P, we define 

CO TJ . JD D* D D* 

with B* defined as follows (suppose Bi = p{t \, . . . , tn) ): 

if p ~ Rel{H) and p G Tabp, then B* = p“(ti, . . . , tn), where p°' /n is a new 

predicate, else B* = 0. 

Let TRf = {p°'/n \ p/n G TRp} (recall that TRp = Tabp n Recp ). 

— For the program P, we define 

P“ = {C“ I CGP}U{p“(Xi,...,X„)^ I pynGTR%}. 

— The set of tabled predicates of the program P“ is defined as 

Tabpa = Tabp U TRp. 

It is easy to see that Call{P, S) = Call{P°‘, S) n Bp. Also, if we denote with 
cas{P, {p{t)}) the set of computed answer substitutions of P U {<— p{t)}, then 
cas{P, {p{t)}) = cas{P°', {p(t)}) for all p{t) G Bp. It is important to note that, if 
we have a query p(t) G Btrp to a program P, then p{t)a is a computed answer if 
p°‘(t)a G Call{P°‘, {p(^}). This is in fact the main purpose of the transformation. 

We want to mention that a similar transformation, namely the solution- 
transformation, is introduced in [8, Definition 4.2] in order to relate the con- 
cepts of LG-termination and quasi-termination. But, as opposed to the answer- 
transformation, the solution-transformation introduces much more “overhead” 
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in the sense that a clause C = H ^ B\,. . . ,Bn is transformed into a clause 
Csoi = H ^ Bi, sol{Bi), . . . , Bn, sol{Bn) where sol/1 is a new tabled predi- 
cate. Notice that in contrast, the answer-transformation only keeps track of the 
computed answers of recursive, tabled body atoms (and not of all body atoms) . 

Example 3. Let P be the program of Example 1, with Tabp = {meTO&eT’/2}. The 
a-transform, of P is shown in Fig. 5; Tabpa = {member / 2, member‘s / 2}. 

intersection{Xs,Ys, Z) <— member {X s, Z), member {Ys, Z) 

member{[Z\Zs], Z) <— 

member {[X\ Z s\, Z) <— member {Z s, Z), member °'{Zs, Z) 

member°'{L, Z) <— 

Fig. 5. The a-transformation of P from Example 1 . 

The following theorem is a generalisation of [8, Theorem 4.1] (there, the 
previously mentioned solution-transformation of [8, Definition 4.2] is used to 
relate LG-termination and quasi-termination in case Tabp = Predp). 

Theorem 2. (characterisation of LG-termination in terms of quasi- 
termination) Let P be a program, Tabp C Predp and S C Bp. P LG- 
terminates w.r.t. Tabp and S iff P°' quasi-terminates w.r.t. Tabpa. and S. 



Example (Ex. 3 ctd.). The LG-forest of P U {<— intersection{Xs,Ys,a)} 
w.r.t. Tabp was shown in Fig. 3. Note that the trees are infinitely branching and 
hence, P does not LG-terminate w.r.t. Tabp and {intersection(Xs,Ys, a)}. 

In Fig. 6, the LG-forest of the program and {inter section{Xs, Ys, a)} w.r.t. 
Tabpa is shown. Note that there are infinitely many LG-trees in the forest; P“ 
doesn’t quasi-terminate w.r.t. Tabpa and {intersection(Xs,Ys,a)}. 



intersection(Xs,Ys,a) member(Xs,a) 




member(Ys,a) member(Ys,a) member^([a|Xls],a) member^([X,a|Xls],a) 




member^([a|Xls],a) 



□ 



member^([X,a|Xls],a) 



□ 



Fig. 6. The LG-forest for P“ U {<— intersection{Xs,Ys,a)}. 



Modular Termination Proofs for Prolog with Tabling 355 



Theorem 2 provides a way to prove LG-termination of a program w.r.t. a set of 
queries. Namely, it is sufficient to prove quasi-termination of the a-transforma- 
tion of the program w.r.t. the set of queries. To prove quasi-termination, we can 
use the results of Section 4.1: the condition of Theorem 1, which is necessary 
and sufficient in case the tabling is well-chosen^, and which is sufficient in the 
general case. However, the condition of this theorem on P“ can be weakened; 
i.e. some of the decreases “1(A) > l(Bi99i-iy' need not be checked because 
they can always be fulfilled. In particular, we only have to require the non-strict 
decrease for recursive, tabled body atoms Bi (to obtain an LG-forest with only 
finitely many LG-trees) or for body atoms Bi of the form p“(ti, . . . , tn) (to obtain 
LG-trees which are finitely branching); the conditions on non-tabled predicates 
remain the same. The following theorem presents these optimised conditions and 
characterises LG-termination of a program in case the tabling is well-chosen. If 
the tabling is not well-chosen, the condition is sufficient (but not necessary). 

Theorem 3. (characterisation of LG-termination in case the tabling is 
well-chosen) Let P be a program, Tabp ^ Predp and S C Bp. Suppose the 
tabling Tabp is well-ehosen w.r.t. P. Then, P LG-terminates w.r.t. Tabp and 
S iff there is a level mapping I on Bpa. such that for every A G S, I is finitely 
partitioning on Call(P°‘, {H}) n PrflpUTij" ; such that 

— for every atom A such that A G Call(P°' , S), 

— for every clause H ^ B\, ... , Bn in P°‘ , such that mgu(A, H) = 9 exists, 

— for every Bi such that Rel(Bi) ~ Rel(Pl) or Rel(Bi) € TRfi, 

— for every LD-c.a.s. 0i_i in P“ for ^ (Bi , . . . , Bi-i)9: 



Example 5. Recall the recognition part of the grammar program of Fig. 2, where 



Tabp = {a/2}. We show that R LG-terminates w.r.t. {a/2} and S = {s(si, So)} 
where .si is a ground list consisting of atoms and So is a variable. Gonsider the 
following a-transformation, of R (Tabpa = {a/2,a“/2}) shown below. When 

r s(Si, So) ^ a(Si, S),S= [fo|5'o] a“(5'i. So) ^ 

a(Si,So) ^ a(Si,S),a‘^(Si,S),a(S,So),a°-(S,So) 
i a(Si, So) ^ Si = [alSo] 

applying Theorem 3, we only have to consider the second clause of i?“. Note that, 
^ Note that if Tabp is well-chosen w.r.t. P, then also Tabpo. is well-chosen w.r.t. P“. 



1(A) > l(B,99,_i) 



and 



1(A) > l(B,99,-i) if Rel(A) ~ Rel(B/) e NTabp 

and C 2 (Rel(A), Rel(Bi)) does not hold. 
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for all a{tl,t2) G Call{R°^,{s{si, SO)}), tl is a sublist of si and t2 is a variable. 
Also, for all a“(ul,u2) € Call{R‘^,{s{si,So)}), vl is a sublist of si and v2 is a 
(strict) sublist of vl. Consider the trivial level mapping I (mapping everything 
to 0) on Call{R°‘, {s(si, So)}) n Bfa /2 a“/ 2 }- Since this set is finite, I is obviously 
finitely partitioning on this set. R and d, together with I, satisfy the conditions 
of Theorem 3. Hence, R LG-terminates w.r.t. {a/2} and S. 

5.2 Modular Proofs for LG-Termination 

Similarly to the case of quasi-termination (Section 4.2), we want to be able to 
obtain modular termination proofs for LG-termination of the union P U i? of 
two programs P and R, where P extends R. Note that, because of Theorem 2 
and because (P U P)“ = P“ U P“ (if P extends R), we can use the modular 
conditions for quasi-termination of Section 4.2. However, as we already noted 
in Section 5.1, we can give optimised conditions which require less checking for 
decreases between the values under the level mapping of the head and body 
atoms. Due to space limitations, in this paper we will only consider the case of 
two programs P and R, where P extends R, and no defined predicate symbol 
of P is tabled. Proposition 3 will give modular conditions for LG-termination 
of P U P in that case, without using Theorem 2. The general case, the case in 
which all defined predicate symbols of P are tabled and the case in which the 
two programs extend each other are treated in the full version of the paper [17]. 

Proposition 3. Let P and R be two programs, such that P extends R and such 
that Defp C NTabp. Let S C Ppy^. If 

— R LG-terminates w.r.t. Tabp and Call(P U R, S), 

— there is a level mapping I on Bp such that 

• for every atom A with A G Call{P U R, S), 

• for every clause H <— Pi, . . . , P„ in P such that mgu{A, H) = 9 exists, 

• for every Bi, i G (1, . . . ,n}, with Rel{Bi) ~ Rel{A), 

• for every LD-c.a.s. 9i_i in PU R for <— (Pi, . . . , Bi-i)9: 

1{A) > Z(P,00,_i) 

then, PUP LG-terminates w.r.t. Tabpup and S. 

Example 6. Recall program P of Example 5. Let P be the parsing part of the 
grammar program of Fig. 2 which is also shown below. As already noted in 
Section 2, P extends P, and the only tabled predicate in P U P is a/2 — see 
Section 2 for why this tabling is sufficient. 

r s{Si,So,PT) ^ a{Si,S),S = [h\So],PT = spt{Pa,h),a{Si,S,Pa) 

P : a\si, So, PT) ^ a\si, S), a{S, So), PT = apt(P, P'), a{Si, S, P),a{S, So, P') 

[ a{Si, So, PT) ^ Si = [alS-o], PT = a 

Let S = {s{si,So,PT)} where si is a ground list of atoms, and So, PT are 
distinct variables. We show, by using Proposition 3, that PUP LG-terminates 
w.r.t. {a/2| and S. 
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— R LG-terminates w.r.t. {a/2} and Call{PU R,S). 

Note that, if a{tl,t2) € Call{PU R,S), then either tl is a sublist of si and 
t2 is a variable, or tl and t2 are both sublists of si. In Example 5, we proved 
that R LG-terminates w.r.t. this first kind of queries. To prove that R LG- 
terminates w.r.t. the second kind of queries, we can again apply Theorem 3. 
Due to space limitations, we omit the proof here. 

— Note first that, if a{tl,t2, PT) € Call{P U R, S), then t2 is a (strict) sublist 

of tl, tl is a sublist of si and PT is a variable. Let I be the following level 
mapping on Call{P U R,S) n I{a{tl,t2, PT)) = || tl ||; — || t2 ||;, 

where || ||; is the list-length norm. Because of the remark above, I is well- 

defined. Note that we only have to consider the recursive clause for a/3 in 
the analysis. 

• Gonsider the fourth body atom in the recursive clause for a/3. If this 
clause is called with a{ti, to, PT), with to a (strict) sublist of ti, then the 
fourth body atom is called as a{ti, t, P) where to is a (strict) sublist of t 
and t is a (strict) sublist of ti. Hence, l{a{ti, to, PT)) = |j ti ||; — || to ||; > 

II ti II; - II t II, = l{a{ti,t,P)). 

• Gonsider the last body atom. If the clause is called with a{ti,to, PT), 
with to a (strict) sublist of ti, then the last body atom is called as 
a(t, to, P') where to is a (strict) sublist of t and t is a (strict) sublist of ti. 
Hence, l{a{ti,to, PT)) = || ti ||, — 1| to ||, > || t ||, — 1| to ||, = l{a{t,to,P')). 

We conclude that PUR and S satisfy the condition of Proposition 3, so P U i? 
LG-terminates w.r.t. {a/2| and S. 

6 Related and Future Work 

Our work is based on, and significantly extends, the results of [8]. In [8], the 
two notions of (universal) termination under tabled execution, namely quasi- 
termination and LG-termination, are introduced and characterised. As opposed 
to [8], where it is assumed that all predicates in the program are tabled, we here 
consider programs with a mix of tabled and Prolog execution, thereby provid- 
ing a termination framework for ‘real’ tabled programs. We further extend the 
applicability of this framework by presenting modular termination conditions: 
conditions ensuring termination of the union P U P of two programs P and R, 
where P extends R. 

Termination proofs for (S)LD-resolution (such as e.g. those surveyed in [5]) 
are sufficient to prove termination under a tabled execution mechanism, but, 
since there are quasi-terminating and LG-terminating programs, which are not 
LD-terminating, more effective proof techniques can and need to be found. Be- 
sides [8], there are only relatively few works studying termination under tabling. 
In the context of well-moded programs, [14] presents a sufficient condition for a 
program to have the bounded term-size property, which implies LG-termination. 
[11] provides another sufficient condition for quasi-termination in the context of 
functional programming. In parallel with the work reported on in this paper, an 
orthogonal extension of the work of [8] was investigated in [16]. Namely, in [16] 



358 



Sofie Verbaeten et al. 



the constraint-based approach of [7] for automatically proving LD-termination 
was extended to the case of quasi-termination and LG-termination. More specif- 
ically, in the context of simply moded, well-typed programs and queries, suf- 
ficient conditions for quasi-termination and LG-termination (in the case that 
Tabp = Predp) are given. These conditions allow reasoning fully at the clause 
level, contrary to those in the current paper which are stated for sets of calls. 
An integration of these two extensions of [8] is straightforward. 

A topic for future research is to extend our results to normal logic programs 
executed under a mix of Prolog and tabled execution. Another, with an arguably 
more practical flavour, is to investigate how the termination conditions presented 
here can form the basis of a compiler that automatically decides on — or at least 
guides a programmer in choosing — a tabling (i.e. a set of tabled predicates) for 
an input program such that quasi-termination of the program is ensured. 
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Abstract. Software engineering has to reconcile modularity with effi- 
ciency. One way to grapple with this dilemma is to automatically trans- 
form a modular-specified program into an efficient-implementable one. 
This is the aim of deforestation transformations which get rid of in- 
termediate data structure constructions that occur when two functions 
are composed. Beyond classical compile time optimization, these trans- 
formations are undeniable tools for generic programming and software 
component specialization. 

Despite various and numerous research works in this area, general trans- 
formation methods cannot deforest some non-trivial intermediate con- 
structions. Actually, these recalcitrant structures are built inside accu- 
mulating parameters and then, they follow a construction scheme which is 
independent from the function scheme itself. Known deforestation meth- 
ods are too much tied to fixed recursion schemes to be able to deforest 
these structures. 

In this article, we show that a fully declarative approach of program 
transformation allows new deforestation sites to be detected and treated. 
We present the principle of the symbolic composition, based on the at- 
tribute grammar formalism, with an illustrative running example stem- 
ming from a typical problem of standard functional deforestations. 

Keywords: Program transformation, deforestation, attribute grammars, 
functional programming, partial evaluation. 



1 Introduction 

More than a decade ago, P. Wadler said “Intermediate data-struetures are both 
the basis and the bane of modular programming.” [29]. Indeed, if they allow 
functions to be composed, these data-structures also have a harmful cost from 
efficiency point of view (allocation and deallocation). To get the best of both 

G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 360-377, 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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worlds, deforestation transformations were introduced. These source-to-source 
transformations fuse two pieces of a program into another one, where interme- 
diate data-structure constructions have been eliminated. 

The main motivation for deforestation transformations was, for a long time, 
compiler optimization. More recently, with the emergence of component-based 
software development, that requires both automatic software generation and 
component specialization, deforestation transformations find new interest again, 
just as partial evaluation or more generally high level source-to-source program 
transformations [6,4]. 

Since 1990, different approaches have been developed in order to improve the 
efficiency of deforestation transformations. Wadler’s algorithm [29], based on 
Burstall and Darlington unfold/fold strategy [1], has been improved and ex- 
tended by several works [2,12,25,27]. Another approach, the deforestation in cal- 
culational form [11,26,16,28,13], was based on algebraic notions. This latter aims 
at using categorial funetors to capture both function and data-type patterns of 
recursion [18] to guide the deforestation process. 

With a large degree of formalisms or notations, all these methods are able to 
deforest function compositions like the following: 

let lengapp li I 2 = length {append li I 2 ) 
let length x = case x with let append li I 2 = case li with 
cons head tail — > cons head tail — > 

1 -|- {length tail) cons head {append tail I 2 ) 

nil — *■ 0 nil — > I 2 

Intuitively, these techniques process in three steps. First, they expose construc- 
tors to functions (unfolding). 

let lengapp li I 2 = case li with 
cons head tail — > 

length {cons head {append tail I 2 )) 
nil — *■ length I 2 

Next, they apply a kind of partial evaluation to these terms (application to 
constructors), that carries out the elimination of intermediate data structure. 

let lengapp lx I 2 = case lx with 
cons head tail — > 

1 -I- {length {append tail I 2 )) 
nil — > length I 2 

Finally, recursive function calls could be reintroduced or recognized^ (folding). 

let lengapp lx I 2 = case lx with 
cons head tail — *■ 

1 + {lengapp tail I 2 ) 
nil length I 2 

^ Depending on the deforestation method, this step is implicit or not in the process. 
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In the resulting lengapp function definition, the causes of the intermediate list 
have been removed. 

Even if each technique is particular in its algorithm implementation or in 
its theoretical underlying formalism (rewriting rule system [29], foldr/build 
elimination rule [11], fold normalization [26], hylomorphisms fusion [13]), they 
are more or less based on these three steps [8]. 

Major characteristics of these methods are, on the one hand, to expose data- 
structure producers to data-structure consumers in order to find partial evalu- 
ation application sites and, on the other hand, to detect and drive this defor- 
estation process by following a general recursion scheme^, that comes from the 
function or the data structure recursive definitions. 

Unfortunately, all these methods fail in the deforestation of a class of inter- 
mediate data structures. This concerns functions that build — part of — their 
result inside an accumulating parameter, that is, a data which is neither directly 
the result nor the pattern matched syntactic argument of the function, but an 
auxiliary argument. Given a pair of functions to be fused, when the producer 
function collects its result in an accumulating parameter, the constructors in 
that parameter are protected from the consumer. In this case, no deforestation 
normally occurs. 

As a first striking example, let us consider the function rev which reverses a 
list. In the following definition, parameter y is initialized with the value nil: 

let rev x y = case x with 
eons head tail — > 

rev tail (cons head y) 
nil — !■ y 

The classical functional composition of this function with itself leads to the 
function definition let revrev x y z = rev (rev x y) z, where the list built by the 
inner rev is the intermediate data structure consumed by the outer rev. As far 
as we know, no general^ existing deforestation method allows this composition 
to be transformed in a program that solely constructs the final list (x itself). 
Indeed, applying the previously presented three steps to this example leads to: 

let revrev x y z = case x with 
cons head tail 

revrev tail (cons head y) z 
nil rev y z 

During the transformation process, the partial evaluation step has never been 
applied, so the intermediate list is still constructed in revrev function. The only 

^ This recursion scheme can be exploited very simply (syntactically) or more sophis- 
tically (using abstract categorial representations such as functors). 

^ This particular example could be deforested with a dedicated method [26] that can- 
not be applied, for instance, to rev (flat t 1) h (cf. section 2). 
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difference with respect to the classical function composition is that the outer rev 
is now applied as soon as the inner inverted list is constructed (in y): instead of 
applying rev to the result of the first rev call, it is applied to the y accumulating 
parameter when it contains the whole inverted list. 

The reason of this problem is that, since the deforestation methods are based 
on function and data-structure recursion schemes, they are only able to guide 
a deforestation process which is strongly tied to these fixed recursion schemes. 
Thus, they cannot detect nor treat unfaithful constructions in accumulating 
parameters. This approach could be viewed as not declarative enough. 

On the opposite, a distinctive feature of attribute grammars is that they 
are fully declarative specifications [19]. They allow a uniform representation of 
all computations (as well results as parameters) by simple oriented equations. 
More precisely, they distinguish synthesized (bottom-up computed) and inher- 
ited (top-down computed) attributes. By this way, constructions computed in 
inherited attributes (those of accumulating parameters) become accessible by a 
deforestation process. Actually, since the operational semantics of an attribute 
grammar rests on the resolution of an oriented equation system,^ the recursion 
scheme of the represented function is no more explicitly required. 

Translating our example into attribute grammars, our deforestation method, 
namely the symbolic composition, produces a new attribute grammar that no 
more constructs the intermediate list. This attribute grammar could then be 
translated back, by well-known techniques (by the way of functional evaluators 
[21,15]) into the following function definitions: 

let revrev x y z = f 2 x {rev y {fl x z)) 
let f2 X t = case x with let fl x z = case x with 
(1) cons head tail cons head tail 

f2 tail t cons head {fl tail z) 

nil ^ t nil ^ z 

The intermediate list has been completely discarded in these functions, even 
if a useless traversal {f2) of the tree remains. In fact, in the particular case of 
attribute grammars, a copy rule elimination could even discard this traversal [23]. 



The remainder of this article is structured as follows. First, section 2 presents 
syntactic notations, both for functional and attribute grammar languages. Next, 
section 3 describes a translation from functional programs into equivalent at- 
tribute grammars. Essentially, it transforms accumulating parameters into in- 
herited attributes and breaks explicit recursions into oriented equation systems. 
Then, section 4 shows the basic principles of the symbolic composition, detailed 
on an illustrative running example. In conclusion, we discuss related works and 
we sketch future — and current — works related to a generalized formalization 
of this technique and its implementation. 

The equation system constituted by all attribute occurrence definitions. 
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2 Language Syntaxes and Notations 

Rather than the confusing example revrev of introduction®, we will illustrate 
our transformations by the deforestation example of the composition of rev with 
flat, where the function flat computes the list of the leaves of a given binary tree 
(cf. Fig. 1). The list constructed by flat before to be consumed by rev is then 
the intermediate data structure to be eliminated. This example constitutes a 
typical problem since, as far as we know, no known deforestation method is able 
to deal with. Nevertheless, this example represents the class of functions where 
the data-structure producer builds its result with an accumulating variable. 



let flat t I = case t with 
node left right — > 

flat left {flat right 1) / \ \ 

leaf n — > cons n I abed 

Fig. 1. Function definition for flat 




To present the basic steps of our transformations in a simple and clear way, 
we deliberately restrict ourselves to a sub-class of first order functional programs 
with the syntax® presented in Fig. 2. Nested pattern-matching are not allowed, 
but are easy to split in several separated functions. Moreover, the statements 
if-then-else can be taken into account with Dynamic Attribute Grammars [22]. 

To bring our attribute grammar notation, presented in Fig. 3, closer to func- 
tional specifications, algebraic type definitions will be used instead of classical 
context free grammars [3,9,8]. This notation is not the classical one, but is a 
minimal form for explanatory purpose. Thus, a grammar production is repre- 
sented as a data-type constructor followed by its parameter variables, that is, a 
pattern (for example: cons head tail). 

® We prefer the revflat rather than the revrev example for explanatory purpose, be- 
cause it involves two different functions and then avoids name confusions. 

® Notation x stands for xi . . .Xn. 



prog 


■- {de/}+ 


def 


:= let f X = exp 




1 let f X = case Xk with {pat — > exp}'^ 


pat 


\= cx 


exp 


:= Constant 




1 X € Variables 




1 g 'exp Function or constructor call 



Fig. 2. Functional language 
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block 

semrule 

occ 

exp 



= aglet f = {f X ^ semrule}{pat — » semrule}* 
= occ = exp 
= x.a I f .result 
= Constant 

I y.b G Attribute occurrences 
I X G Variables 

I g exp Attribute grammar or constructor call 



Fig. 3. Attribute grammar notation 



As previously said, a characteristic feature of attribute grammars is to distin- 
guish two sorts of attributes: the synthesized ones are computed bottom-up over 
the structure and the inherited ones are computed top-down. Since our trans- 
formations will consider type-checked functional programs as input, this induces 
information about the generated attribute grammars. Thus, the sort and the 
type of attributes are directly deduced from the type-checked input program 
and could be implicit. 

Furthermore, the notion of attribute grammar profile is introduced (in Fig. 3, 
f X is the profile of /). It represents how to call the attribute grammar and allows 
result and arguments to be specified. 

The occurrence of an attribute a on a pattern variable x is noted x.a, even if 
this pattern variable is the constructor of the current pattern itself^. For instance, 
according to the syntax in Fig. 3, the function rev could be specified by the 
attribute grammar in Fig. 4. This figure contains also an intuitive illustration 
for the application rev {eons a {eons b {eons c nil))) nil. 

With this notation, the name rev stands all at the same time for the attribute 
grammar, for the profile constructor and for a synthesized attribute. Variable x, 
the list to be reversed, is the pattern-matched argument and h is the parameter. 
The attribute result is the only synthesized of the profile. Variable x, and all 

^ In CFG terms, it plays the role of the left hand side (parent) of the production. 



aglet rev = 



nil result 



rev a; h — > 



I I I 



nil — > 



cons head tail — > 
cons, rev = tail, rev 
tail.h = cons head cons.h 



rev. result = x.rev 
x.h = h 




nil. rev = nil.h 






Fig. 4. Attribute grammar rev 
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pattern-matched (sub-) list, have two attributes: rev synthesized and h inherited. 
Each oriented equation defines an attribute for a given pattern variable. 



3 Translation FP-to-AG 



The intuitive idea of the translation FP-to-AG, from a functional program into 
its attribute grammar notation, is the following. Each functional term associ- 
ated with a pattern has to be dismantled into a set of oriented equations, called 
semantic rules. Parameters in functional programs become explicit attributes 
attached to pattern variables, called attribute occurrences, that are defined by 
the semantic rules. Then, explicit recursive calls become implicit on the underly- 
ing data structure and semantic rules make the data-flow explicit. FP-to-AG is 
decomposed into a preliminary transformation and a profile symbolic evaluation. 

These are notations used in further definitions and transformations. 



def 

x.a = exp : 
[x := y] : 
E : 

n : 

Ch A ^ B-. 
£[e] : 



local definition in an algorithm 

semantic rule defining x.a 

substitution of a; by ?/ 

a set of semantic rules 

a pattern with its set of semantic rules 

transformation from A into B according to the context C 

a term containing e as a sub-expression. 



Preliminary Transformation The aim of the preliminary transformation, 
presented in Fig. 5, is to draw the general shape of the future attribute grammar. 
It introduces the attribute grammar profile, with its semantic rules, and a unique 
semantic rule per each constructor pattern. 

The attribute result is defined as a synthesized attribute of the profile and it 
stands for the expected result of the function (rule Let’). For function with case- 
statement the result is computed through attributes on the pattern-matched 
variable (rule Let): one is synthesized, named by the function name itself, and 
each supplementary argument of the function profile yields a semantic rule defin- 
ing an inherited attribute attached to the pattern-matched variable. 

Each function call (/ a) is translated into a dotted notation (/ b). result (rule 
App). This rule distinguishes between function and type constructor calls®. Thus, 
each expression appearing in a pattern is transformed into a single semantic rule 
which defines the synthesized attribute computing the result (rule App). This 
induces some renaming (rule Pattern). 

The application of the preliminary transformation to the function flat (Fig. 1) 
leads to the result shown in Fig. 7. 



Profile Symbolic Evaluation The result of the preliminary transformation is 
not yet a real attribute grammar. Each function definition in the initial program 
has been translated into one block (cf. Fig. 3) which contains the profile of the 

® This distinction is performed from type information of the input functional program. 
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Vi h Oi => 6i ; f is a function name 

exp 

h {f a) ^ (/ b). result 

exp 

h e ^ e' 

pat 

/: {xj}j^k,Xk P cy^e^cy^c.f = e'[xk:= c][xj := c.a:j]vi#fc 



Vi f,{xj}jj^k,Xk \- Pi Ili 

jj I f,result = Xk-f j U ili 
V Xk-Xj = Xj / 

let 

h let f X = case Xk with p e ^ aglet f = U 

exp 

h e => e' 

let 

h let f X = e ^ aglet f — f x —> /.result = e' 
Constants and variables are left unchanged by the transformation. 



\- e ^ e' means that the equation e is translated into equation e' . 

pat 

env P p ^ e ^ p ^ TZ means that the expression associated with the pattern p 
is translated into the set of semantic rules TZ, 
with respect to the environment env. 

let 

\- T> ^ B means that the function definition T> is translated 

into the block B. 



Fig. 5. Preliminary transformation 



de/ r 

a = \Xi ~ Oi 



/.result = y. G P ^ de/ f « - CheckpsE(c, f, E) 

E, J ^ = -(^/) 

I ^aux 



V u = 8[{f a). result] 



V \- p ^ E\ ^ p ^ E 2 means that in the program V the set of equations Ei 
of a pattern p is transformed into E 2 . 



Fig. 6. Profile symbolic evaluation (PSE) 
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aglet flat — 
flat t I ^ 

flat. result = t.flat 
t.l = I 

node left right — > 

node. flat = {flat left {flat right node. 1). result). result 
leaf n — > 

leaf .flat = cons n leaf . I 

Fig. 7. The function flat after the preliminary transformation 



function and its related patterns. But explicit recursive calls have been translated 
into the form (/ a). result. Now, these expressions have to be transformed into 
a set of semantic rules, breaking explicit recursions by attribute naming and 
attachment to pattern variables. Then, these semantic rules will implicitly define 
the recursion d la attribute grammar. This transformation is achieved by the 
profile symbolic evaluation {PSE) presented in Fig. 6. 

Everywhere an expression (/ a). result occurs, the profile symbolic evaluation 
projects the semantic rules of the attribute grammar profile /. The application 
of this transformation must be done with a depth-first application strategy. Nev- 
ertheless, the predicate CheckpsE ensures that the resulting attribute grammar 
is well formed. Essentially, it verifies that each attribute is defined once and only 
once. Then, in the context of well-defined input functional programs, CheckpsE 
forbids non-linear terms such as g {f y 1) {f y 2). Moreover, in a first approach, 
terms like {x.a).b are not allowed but they will be treated in section 4, to de- 
tect composition sites. Finally, CheckpsE prevents cyclic treatments with the 
condition cf^f and all these conditions allow FP-to-AG to terminate. 

In the pattern c fj e No terms {x.a).b , ^ 

e is linear for each yi occurs in E ' 

CheckpsE{cJ,S) 

Wherever CheckpsE{c, /, E) is not verified, the expression (/ a). result is simply 
rewritten in the function call (/ a). 



( flat t I ^ \ 

flat. result = t.flat | € V 
t.l = l J 

(fjG f 

(7 = [t ■= right][l := node.l] 



^ dt^ j node. flat = {flat left right. flat). result 
I right.l = node.l 
CheckpsE{node, flat, S) 



flat h 



node left right — > node. flat 
^ node left right —> E 



{flat left {flat right node.l). result ). result 



Fig. 8. Example of PSE application for the pattern node left right 
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aglet flat = 
flat t I ^ 

flat. result = t.flat 
t.l = I 

node left right —> 
node, flat = left, flat 
left. I = right. flat 
right. I = node. I 
leaf n — > 

leaf .flat = cons n leaf . I 



nil result 



I 



t 




I flat 1 flat 





1 flat 1 flat 




Fig. 9. Attribute grammar produced by FP-to-AG from function flat 



The application® of the profile symbolic evaluation on the semantic rule for 
the leaf pattern is presented in Fig. 8. Finally, complete applications of the 
profile symbolic evaluation for the function flat leads to the well- formed attribute 
grammar given in Fig. 9. This figure gives also an illustrative example of flat 
application on the tree node {leaf a) {node {leaf b) {leaf c)). 

The same algorithm applied to the function rev (given in introduction) yields 
the attribute grammar in Fig. 4. Then, the successive application of preliminary 
transformation and profile symbolic evaluation to an input functional program 
leads to a real attribute grammar. This is the translation FP-to-AG. 

The cost of the preliminary transformation is linear with respect to the depth 
of input functional terms. Each function definition yields a profile block, with one 
semantic rule per each argument of the function. Furthermore, for each initial 
pattern case, a semantic rule defines an attribute occurrence that represents 
the value of the function in this case. These equations contain function calls 
that will be dismantled by the profile symbolic evaluation. The required number 
of application of this step is proportional to the number of recursive calls it 
contains. In this sense, the cost of FP-to-AG linearly depends on the size and 
the depth of the input functional program terms. 



4 Symbolic Composition 

It is now possible to apply attribute grammar deforestation methods to func- 
tional programs translated by FP-to-AG. Our technique, the symbolic composi- 
tion, is based on the classical descriptional composition of two attribute gram- 
mars due to Ganzinger and Giegerich [10], but extends its application conditions 
and exploits the particular context stemming from translated functional pro- 
grams. In order to describe our symbolic composition, we first present a natural 
extension of profile symbolic evaluation which is useful in the application of the 
symbolic composition. 

® Underlined terms show where the rule is being applied. 
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It is important to note here that even if the final results of symbolic composi- 
tion are attribute grammars, the objects that will be manipulated by intermedi- 
ate transformations are more blocks of attribute grammars rather than complete 
attribute grammars. Furthermore, the expressions of the form (x.a).b, previously 
avoided (cf. predicate CheckpsE in PSE), will be temporarily authorized by a 
similar Check predicate in the symbolic composition process. 

Symbolic Evaluation Profile symbolic evaluation (PSE) can be generalized 
into a new symbolic evaluation {SE), presented in Fig. 10. This later performs 
both profile symbolic evaluation and partial evaluation on finite terms. The idea 
of this transformation is to recursively project semantic rules on finite terms and 
to eliminate intermediate attribute occurrences that are defined and used in the 
produced semantic rules. 

Indeed, rather than only project terms of the profile (function name) as in 
PSE, that is, on expressions (/ a). result, the symbolic evaluation SE will project 
terms related to each expression (/ a).w, were / stands as well for a type con- 
structor as for an attribute grammar profile. Since these expressions could be 
coupled with inherited attribute occurrence definitions like (/ a)./i = iph, corre- 
sponding to parameters of the function represented by w, these definitions must 
also be taken into account by the transformation. 

To illustrate the use of symbolic evaluation as partial evaluation, consider the 
term let g z = rev {cons a {cons b nil)) z. Applying FP-to-AG to this term 
yields the following attribute grammar profile: 

aglet g z ^ 

g. result = {cons a {cons b nil)). rev 
{cons a {cons b nil)).h = z 

Then, the symbolic evaluation (Fig. 10) could be applied on these terms. The 
first step of this application is presented in Fig. 11. Two other steps of this 
transformation lead to g. result = {cons b {cons a z)). 

So, symbolic evaluation performs partial evaluation on finite terms. 
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( 



cons head tail — > 
cons. rev = tail. rev 
tail.h = cons head cons.h 



a = [head ~ a][tail := cons b nil][cons.h := z] 



£V r = 




Checking, cons, E) 



r\-gz^ 



g. result = [cons a {cons b nil)). rev 
(cons a {cons b nil)).h = z 



^ g z^ E 



Fig. 11. Example of SE application for rev {cons a {cons b nil)) z 

This generalization of the profile symbolic evaluation, into the symbolic eval- 
uation used as a partial evaluation mechanism, implies that the complexity of 
this transformation directly relies on those of the treated terms. Practically, the 
number of symbolic evaluation applications must be arbitrary limited in order to 
prevent infinite loop, for instance in partial evaluation of an infinite list reversal. 
Nevertheless, at any stage of this process, a part of the computation has been 
symbolically performed. 

Composition Getting back to our running example, consider the definition 
of the function revflat which flattens a tree and then reverses the obtained list. 



Intuitively, in the context of attribute grammar notation, this composition 
involves the two sets of attributes Attflat = {flat , 1} and Attrev = {rev , h}. 

More generally, consider an attribute grammar T (e.g., flat), producing an 
intermediate data structure to be consumed by another attribute grammar Q 
(e.g., rev). Two sets of attributes are involved in this composition. The first 
one, Attj^, contains all the attributes used to construct the intermediate data- 
structure. The second one, Attg, contains the attributes of Q. 

As in the descriptional composition of classical attribute grammars [10], the 
idea of the symbolic composition is to project the attributes of Attg (e.g., Attrev) 
everywhere an attribute of Atty^ (e.g., Attflat) is defined. This global operation 
brings the equations that specify a computation over the intermediate data- 
structure on its construction. The basic step of this projection {Proj) is presented 
in Fig. 12. Then, the application of the symbolic evaluation will eliminate the 
useless constructors. 

From the complexity point of view, the projection step is essentially similar 
to the classical descriptional composition [10], that is, quadratic: the composition 
of two attribute grammars, respectively using n and m attributes, leads to m*n 
attributes in the resulting attribute grammar, with as much semantic rules. 

However, a point remains undefined: how to find the application sites for the 
projection steps Proj ? As attended, the predicate CheckpsE is temporarily re- 
laxed in Check, authorizing expressions like {x.a).b. In fact, all these expressions 
are precisely the sites where deforestation could be performed (e.g., {t .flat) .rev) . 



let revflat t I h = rev {flat t 1) h 
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With this relaxed predicate Check and from the definition of the function 
revflat, we obtained the blocks presented in Fig. 13 (first is for the revflat profile, 
and others correspond to attribute grammars flat and rev) . In the blocks building 
the intermediate data structure, potential application sites for the projection step 
Proj are underlined, and a * highlights the construction to be deforested. 

Fig. 14 shows the projection step for the pattern leaf and all applications of 
this steps yield the blocks in the left part of Fig. 15. 

Now, symbolic evaluation could be tried on annotated sites, performing the 
real deforestation. The first annotated site is not a potential site for the sym- 
bolic evaluation application, since I is neither an attribute grammar (profile) call 
nor a type pattern constructor. In this case, as wherever Check is not verified, 
the computational context is reintroduced in the form of an attribute grammar 
(function) call {rev I {t.l).h). This functional call retrieval, together with linear- 
ity and distinct pattern (c yf /) conditions of the Check predicate avoid infinite 
unfolding and ensure termination of the process (with the arbitrary limit for 
symbolic evaluation application mentioned in the partial evaluation discussion) . 

On the other hand, a symbolic evaluation step is successfully applied on the 
second annotated site, actually eliminating a cons construction. Finally, new 
attributes are created by renaming attributes a.h into aJa (when a € Attjr and 
h £ Attg). More precisely, {x.a).h is transformed into x.aJb. 

Then, the basic constituents of the symbolic composition are defined: 

Symbolic Composition = renaming o {SE) o {Proj) 

Thus, for the function revflat, the symbolic composition leads to the defor- 
ested attribute grammar presented in the right part of Fig. 15, where four at- 
tributes have been generated. Producing a functional evaluator for this attribute 
grammar yields the functions^*^ revflat, fl and f2 presented in Fig. 16. 

The function fl, corresponding to attributes IJi (its result) and flat_h (its 
argument), performs the construction of a list. The function f2, corresponding to 

Functions fl and f2 respectively correspond to the traversal {passes) determined by 
the attribute grammar evaluator generator. 



a £ Attjr s = AttJsg h = Att JSg 



Attg, Attjr h x.a = e 



{x.a).s = (e).s Vs e s 
{e).h = {x.a).h \/h £ h 



{Proj) 



Attg, Attjr h eg => H means that, while considering Q o T , the equation eg 
is transformed into the set of equations E. 

Att_Sg is the set of synthesized attributes of Attg. 

AttJHg is the set of inherited attributes of Attg. 



Fig. 12. Projection step 
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revflat t I h —> 

rev flat. result = {t.flat).rev 
{t.flat).h = h 
t.l = I 

node left right — > 
node, flat = left, flat 
left. I = right. flat 
right. I = node. I 
leaf n — > 

leaf .flat = cons n leaf . I * 



Fig. 13. Blocks for revflat before projection steps 



cons head tail — > 
cons. rev = tail. rev 
tail.h = cons head cons.h 
nil —> 

nil. rev = nil.h 



flat € AUftat 



S = Att^Srev = {rev} 
h = Att^Hrev = {/l} 



Att rev, Att flat leaf .flat = cons n leaf . I 



{leaf .flat) .rev = {cons n leaf .1). rev 
{cons n leaf .l).h = {leaf .flat). h 



Fig. 14. Example of Proj application for the pattern leaf n 



attributes flat -rev (its result) and I -rev (its argument), only propagates its argu- 
ment along the tree. Then, the second parameter in the call f2 t {rev I {fl t h)) 
corresponds to the semantic rule t.l -rev = {rev I t.l-h) in the profile of the at- 
tribute grammar. Indeed, since t.lJi stands for the call {fl t h) and since t.l-rev 
corresponds to the second argument of f2, the later stands for rev I {fl t h). 

The intermediate list is no more constructed and revflat is deforested. This 
achieves the presentation of our declarative deforestation methods on this typi- 
cal example. Of course, this technique works equally well for simpler functions, 
without intermediate construction in accumulating parameters. 

5 Conclusion 

This paper shows that a fully declarative approach of program transformation 
could resolve a tenacious problem of deforestation: to deforest in accumulat- 
ing parameters. The symbolic composition presented in this paper comes from 
a large comparison of deforestation techniques [8] and from the establishment 
that fixed recursion schemes, provided by data type specifications, are not flex- 
ible enough to catch all intermediate data structure constructions in function 
compositions. Several approaches attempted to abstract these recursion schemes 
in order to refine their manipulation, for instance by categorial representation 
[28,13]. We were first surprised that these elaborate methods do not succeed 
in deforestations performed in the context of attribute grammar transforma- 
tions. But two points differentiate them. First, attribute grammars are using 
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revflat t I h —> 

revflat. result = {t.flat).rev 
= h 

{t.Vj.rev = (l).rev 1 I is neither a function 
{l).h = (t.l).h ( nor a constructor call 

node left right 

{node. flat). rev = {left. flat). rev 
{left. flat). h = {node. flat). h 
{left. 1). rev = {right, flat). rev 
{right. flat). h = {left.l).h 
{right.l).rev = {node. 1). rev 
{node.l).h = {right. l).h 
leaf n — > 

{leaf .flat) .rev = {cons n leaf .1). rev 
{cons n leaf.l).h = {leaf .flat) .h 



SE site 



aglet revflat = 
revflat t I h —> 

revflat. result = t.flat.rev 
t.flat.h — h 
t.Lrev = {rev I t.lJi) 
node left right — > 

node.flat.rev = left. flat jrev 
left.flat.h — node.flat.h 
left.ljrev = right, flat ^rev 
right, flat^h = left.l.h 
right.ljrev = node.ljrev 
node.lJi — right.l^h 
leaf n — > 

leaf .flat _rev = leaf.Lrev 
leaf .IJi = cons n leaf.flat.h 



Fig. 15. Attribute grammar revflat before and after symbolic evaluation and 
renaming 



let revflat t I h = f 2 t {rev I {fl t h)) 



let flth = case t with 
node left right — > 
fl right {fl left h) 
leaf n — > cons n h 



let f2 t I = case t with 
node left right — > 
f2 left {f2 right 1) 
leaf n — > I 



Fig. 16. Functions corresponding to the deforested attribute grammar revflat 



fully declarative specifications, independently of any evaluation method, thanks 
to an operational semantics based on equation systems and dependencies res- 
olution. Next, this declarative approach led them to use inherited attributes 
instead of supplementary arguments in order to specify top-down propagations 
or computations; this allows all computations — particularly intermediate data 
structure constructions — to be uniformly specified, and then, uniformly treated 
by transformations. This reinforces our conviction that the declarative formalism 
of attribute grammars is simple and appropriate for this kind of transformations. 

Moreover, symbolic composition extends the descriptional composition: first, 
it could now be used as a partial evaluation mechanism and next, it could be 
applied to terms with function compositions, and not only to a sole composition 
of two distinct attribute grammars (attribute coupled grammars [10]) that are 
isolated of all context. For the attribute grammars community, this stands as 
the main contribution of this paper. 

Nevertheless, as we wanted the presentation in this paper to be intuitive 
and convincing, accepted programs were limited by attribute grammar restric- 
tions. For instance, non-linear terms, forbidden by the Check predicates, or 
higher order specifications are not addressed in this presentation for techni- 
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cal reasons due to the attribute grammar formalism. We now have formal- 
ized a complete system that includes and extends both symbolic composition 
and the declarative essence of attribute grammar formalism. Equational se- 
mantics, fully detailed in [7], is able to encode an abstract representation of 
the operational semantics of a program. It supports simple transformations 
that could be combined into more complex ones. Its prototype implementa- 
tion, EQS, is available and performs deforestation and partial evaluation (at 
http://www-rocq.inria.fr/~correnso/agdoc/index.html). Coupled with a FP- 
to-EQS translation, similar to FP-to-AG, EQS is able to deforest higher order 
functional programs, even authorizing some non-linear terms. Since EQS for- 
malization is highly theoretical and language independent, the method and the 
transformation pipeline we have presented in this paper could be viewed as an 
intuitive presentation of these current — and future — works. 

Finally, these works are involved in a more general study addressing gener- 
icity and reusability problems. The goal is to provide a set of high level trans- 
formational tools, able to abstract a given program and then to specialize it for 
several distinct contexts. We have compared [6] some attribute grammars tools 
[17,24,23,5] with similar approaches in different programming paradigms {poly- 
typic programming [14], adaptive programming [20]). Again, it appears in this 
context that declarative aspects of attribute grammars bring them particularly 
suitable for program transformations and that they should be viewed more as 
an abstract representation of a specification than as a programming language. 
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Abstract. We formally characterize partial evaluation of functional pro- 
grams as a normalization problem in an equational theory, and derive a 
type-based normalization-by-evaluation algorithm for computing normal 
forms in this setting. We then establish the correctness of this algorithm 
using a semantic argument based on Kripke logical relations. For simplic- 
ity, the results are stated for a non-strict, purely functional language; but 
the methods are directly applicable to stating and proving correctness of 
type-directed partial evaluation in ML-like languages as well. 



1 Introduction 

The goal of partial evaluation (PE) is as follows: given a program \- p : Sx D^R 
of two arguments, and a fixed “static” argument s : S', produce a specialized pro- 
gram \- ps : Z? —> i? such that for all “dynamic” d : D, Eval [psd) = Eval(p{s,d)). 
That is, running the specialized program on the dynamic argument is equivalent 
to running the original program on both the static and the dynamic one. 

In a functional language, it is of course trivial to come up with such a,ps'. just 
take Ps = Xd.p(s,d). That is , the specialized program simply invokes the original 
program with a constant first argument. But such a,ps is likely to be suboptimal: 
the knowledge of s may already allow us to perform some simplifications that 
are independent of d. For example, consider the power function: 

power (n, x) "= if n = 0 then 1 else x x power{n — 1, x) 

Suppose we want to compute the third power of several numbers. We can achieve 
this using the trivially specialized program: 

power 2 , = Xx.power{3,x) 

But using a few simple rules derived from the semantics of the language, we can 
safely transform power ^ to the much more efficient 

power 2 = Xx.x x (x x (x x 1)) 

* Part of this work was carried out at the Laboratory for Foundations of Computer 
Science, University of Edinburgh, supported by a EuroFOCS research fellowship. 

G. Nadathur (Ed.): PPDP’99, LNCS 1702, pp. 378-395, 1999. 
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Using further arithmetic identities, we can also easily eliminate the multiplication 
by 1. On the other hand, if only the argument x were known, we could not 
simplify much: the specialized program would in general still need to contain a 
recursive definition and a conditional test - in addition to the multiplication. 
(Note that, even when x is 0 or 1, the function as defined should still diverge for 
negative values of n.) 

To facilitate automation of the task, partial evaluation is often expressed as 
a two-phase process, usually referred to as off-line PE [14]: 

1. A binding-time annotation phase, which identifies all the operations that can 
be performed using just the static input. This can be done either mechan- 
ically by a binding-time analysis (often based on abstract interpretation), 
or - if the intended usage of the program is clear and the annotations are 
sufficiently intuitive and non-intrusive - as part of the original program. 

2. A speeialization phase, which takes the annotated program and the static 
input, and produces a simplified Ps, in which all the operations marked as 
static have been eliminated. 

The annotations must of course be consistent, i.e., a subcomputation in the 
program cannot be classified as static if its result can not necessarily be found 
from only the static input. But they may be conservative by classifying some 
computations as dynamic even if they could in fact be performed at special- 
ization time. Techniques for accurate binding-time analysis have been studied 
extensively [14]. In the following we will therefore limit our attention to the sec- 
ond phase, i.e., to efficiently specializing programs that are already binding-time 
separated. 

A particularly simple way of phrasing specialization is as a general-purpose 
simplification of the trivially specialized program Xd.p(s^d): contracting 
/3-redexes and eliminating static operations as their inputs become known. What 
makes this approach attractive is the technique of “reduction-free normalization” 
or “normalization by evaluation”, already known from logic and category the- 
ory [2,3,7]. A few challenges arise, however, with extending these results to a 
programming-language setting. Most notably: 

~ Interpreted base types and their associated static operations. These need to 
be properly accounted for, in addition to the /3-reduction. 

~ Unrestricted recursion. This prevents a direct application of the usual strong- 
normalization results. That is, not every well- typed term even has a normal 
form; and not every reduction strategy will find it when it does exist. 

— Call-by-value languages, and effects other than non-termination. In such a 
setting, the usual / 377 -conversions are actually unsound: unrestricted rear- 
rangement of side effects may completely change the meaning of a program. 

We will treat the first two concerns in detail. The call-by-value case uses the 
same principles, but for space reasons we will only briefly outline the necessary 
changes. 

The paper is organized as follows: Section 2 introduces our programming 
language, the notion of a binding-time separated signature, and our desiderata 
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for a partial evaluator; Section 3 presents the type-directed partial evaluation 
algorithm; and Section 4 shows its correctness with respect to the criteria in Sec- 
tion 2. Finally, Section 5 presents a few variations and extensions, and Section 6 
concludes and outlines directions for further research. 



2 A Small Language 

2.1 The Framework and One-Level Language 

Our prototypical functional language has the following syntax of types and terms: 
a ::= b \ ai ^ a2 

E ::= I \ Co-1,. ..,cr„ I X I Xx'^.E \ E1E2 

Here b ranges over a set of base types listed in some signature E, I over a set 
E(b) of literals (numerals, truth values, etc.) for each base type b, and c over 
a set of (possibly polymorphic) function constants in E. Adding finite-product 
types would be completely straightforward throughout the paper, but we omit 
this extension for conciseness. 

A typing context F is a finite mapping of variable names to well- formed types 
over E. The typing rules for terms are then standard: 

I € ^{b) A’(ceri,...,g.„) = a r{x) = a 

r \~z: I ■ b r \~s Co-i,...,cr„ : cr E \~z: x : a 

r,x:ai\~E E ■. a 2 E \~e Ei : ai ^ <72 E \- e E2 '■ ai 

E \~E Xx^^.E : CTi ^ tT2 r \~E El E2 : (J2 

An interpretation of a signature A is a triple I = {B, C,C). B maps every base 
type 6 in A to a predomain (i.e., a bottomless cpo, usually discretely ordered). 
Then we can interpret every type phrase a over A as a domain (pointed cpo) : 

Ibf = B{b)E 

|ai ^ a2f = [aif ^ 

where the interpretation of an arrow type is the full continuous function space. 
We also define the meaning of a typing assignment F as a labelled product of 
the domains interpreting the types of individual variables. 

Further, for any base type b and literal I € A(&), the interpretation must 
specify an element Cb{l) € B{b); and for every type instance of a polymorphic 
constant, an element C(ccr^,,,.,o-„) S |A(co-i,...,o-„)]®. Then we interpret a well- 
typed term F hx- F : cr as a (total) continuous function |Fp : |F]® ^ [c]^, 
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m^p = val-^ Cb{l) 

[xfp = px 

{Xx'^.Efp = AaM"". {Ef{p[x ^ a]) 
lE,E^fp=\E,fp{\E2fp) 

(We use the following notation: val^ x and let^ y x in fy are lifting-injection 
and the strict extension of /, respectively; b ^ x j y chooses between x and y 
based on the truth value b.) 

When \~s E : b is a, closed term of base type, we define the partial function 
Evalj by Evalx{E) = n if = val^ n and undefined otherwise. 

Definition 1 (standard static language). We define a simple functional lan- 
guage (essentially PCF [17]) by taking the signature Eg as follows. The base 
types are int and bool; the literals, S'(int) = -1, 0, 1, 2, ... } and S'(bool) = 

{true, false}; and the constants, 

X : int ^ int ^ int ifc : bool a ^ a ^ a 

=, < : int ^ int ^ bool fixg. : (ct ^ ct) — *■ cr 

(We write the binary operations infixed for readability.) The interpretation of 
this signature is also as expected: 

Bfihool) = B = {tt,ff} 

6s(int) = Z = {...,-l,0,l,2,...| 

Cs(*) = Xx'^^. Xy'^-^.let^ x in let^ y in vaD m-kn *G{-i-.-.x.=,<} 

Cs(if(r) = Aaf‘^1 Aal'^llet-^ 6 <;= a; in 6 ^ oi [ 02 

C.(fix^) = A/M-M|J 

' *l£uj “• •“ 

It is well known (computational adequacy of the denotational semantics for 
call-by-name evaluation [17]) that with this interpretation, Evalj^ is computable. 



2.2 The Binding-Time Separated Language 

Assume now that the signature S is partitioned according to binding times, 
E = Es, A7(j. We will write type and term constants from the static part overlined, 
and the dynamic ones underlined. For simplicity, we require that the dynamic 
base types do not come with any new literals, i.e., S{h) = 0. (If needed, they can 
be added as dynamic constants.) However, some base types will be persistent, 
i.e., have both static and dynamic versions with the same intended meaning. In 
that case, we also include lifting functions ■ b bin the dynamic signature. 
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We say that a type r is fully dynamic if it is constructed from dynamic base 
types only, 

r ::= 6 I n — > T2 

We also reserve A for typing assumptions assigning fully dynamic types to all 
variables. All term constants in must have fully dynamic types, and in par- 
ticular, polymorphic dynamic constants must only be instantiated by dynamic 
types, e.g., Ad(lf.^) = bool 

We will always take the language from Definition 1 with the standard se- 
mantics Xg as the static part. The dynamic signature typically also has some 
intended evaluating interpretation X^; in particular, when is merely a copy 
of As, we can use Xg directly for X| (interpreting all lifting functions as identi- 
ties). Later, however, we will also introduce a “code-generating”, residualizing 
interpretation. 

Example 1. Here are the four different binding-time annotations for the function 
power : int — > int — > int (abbreviating int as 6): 

power : t ^ I— > I = Aa:‘.fix-^-(Ap'"^‘. Xn. \f-{n~ 0) 1 {x x p {n — 1))) 
power ■. T ^ L—y L = Aa:‘.fix^^^ {Xpr~^-. Xn-.\f_^ (n = $ 0) ($ 1) ($ a: _>ip(w — $ 1))) 
power ■. —y L = Aa:-.fixr^i (Ap""^-. Xn. ib (n~ 0) ($ 1) ( 2 ; _>ip(n — 1))) 
power '■ L—y L = Aa:~. fix. {Xpr~^-. Xn-.\f_^ (n = $0)($l)(a;_>£p(n^$l))) 

Note how the fixed-point and conditional operators are classified as static or 
dynamic, depending on the binding time of the second argument. 



2.3 Static Normal Forms and PE 



Definition 2 (static normal forms). Among the well-typed, purely dynamic 
terms A E : t, we distinguish those in normal and atomic form: 



Ah^^E:b Z\,x:ri A : T2 

Zi E : 6 Z\ Xx^KE : n ^ T 2 



I & ■=.{b) ^d(c.ri,...,T„) = T A{x) = T 

Ah^^$bl-b Z\ : T 



Z\ El : Ti ^ T2 Z\ E 2 : n 
A El E 2 : T2 



In particular, such terms contain no static constants nor /3-redexes. (Incidentally, 
this also means that if we had included polymorphic lets in the source language, 
they would simply get unfolded in the resulting normal forms.) 

We can now define a notion of normalization based on (undirected) equality, 
rather than on (directed) reduction [3]. Since lambda-abstracting a dynamic- 
type term over a dynamic-type variable still yields a dynamic term, it suffices to 
be able to compute normal forms of closed terms: 
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Definition 3 (static equivalence and normalization). Let Xg be an inter- 
pretation of Us- We say that two terms E : a and E' : a are 

statically equivalent wrt. Xg, written E E' , if for all Id interpreting Ed, 
static-normalization function is then a computable par- 
tial function NF on well-typed terms such that 

1. If E : r and NF{E) = E then E : r and E E. 

2. If also l“i; 3 ,i:d E' : r and E' E then NF(E') = NF{E) (a-equivalence) . 

We further say that such a normalization function is complete if whenever an 
E satisfying the conditions in (1) exists, NF{E) is defined. 

Example 2. One can check that a complete static-normalization function NF for 
our language must have the following properties: 

NF($ {power ss 3 4)) = $ 81 
NF{\x— .power ds x3) = Xx—.x^ (x >£ (cc >£ $ 1)) 

NF{Xx— .power ds x-2) undefined 

Note first that ordinary evaluation is just a special case of static normaliza- 
tion. The second example shows how static normalization achieves the partial- 
evaluation goal of the introduction. Finally, some terms have no static normal 
form at all; in that case, the normalization function must diverge. 

There are two basic ways to compute normal forms. The usual one is based on 
term rewriting, repeatedly locating and contracting /3-redexes and applications 
of static constants (and possibly ? 7 -expanding the final result). But there is an 
alternative technique, normalization by evaluation, which utilizes the existing 
mechanism of complete-program evaluation (defined only for closed terms of 
base type) as the normalization engine for general terms. This is the subject of 
the next section. 



3 A Normalization-by-Evaluation Algorithm 

We now present Type-Directed Partial Evaluation (TDPE), an efficient algo- 
rithm for computing static normal forms. 



3.1 Representing Programs as Data 

To compute normal forms, we need a way of representing them as program 
outputs. Assume therefore that we have base epos rich enough to contain unique 
representations of all well-formed dynamic types, variable names, and (open) 
static-normal form terms, i.e., sets T, V, and A with injective operations 



BASEb : T 
ARR :T xT^T 

LITb : Bs{b) ^ A 



CST -V xT* ^ A 
VAR :V^A 
LAM :V xTx A^ A 
APP : A X A ^ A 
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(where T* is the set of finite lists of elements from T). Using these, we can define 
injective representation functions for types and terms, such that V € T for any 
dynamic type r, and G yl for Z\ E : t, hy equations such as 

Vi ^ T 2 ' = ARRCrf, V) E" = LAM{x, V, "e") "$t, l" = LITb{Ct{l)) 

We do not need to require a priori that all elements of T and A represent well- 
formed types and terms (although this is easy to achieve), let alone well-typed 
ones, or ones in normal form. For example, we could simply take all of T, V, 
and A as the type of finite character strings. Or, even more radically, Godel-code 
everything in terms of integer arithmetic only. 

To account for potentially diverging normalizations, we must now turn the 
set of term representations into a pointed cpo. To also model the generation of 
“new” variable names, however, we will not work with elements of A± directly, 
but instead introduce a term-family representation, 

A = M^Ax 

where N C J\f . The intent is that for e G T and i G N, if et = vaU ^Ef then all 
bound variables of E will belong to the set {g*, gi+i , . ■ . } C V. 

We also define wrapper functions to conveniently build representations of 
lambda-terms without committing to particular choices of bound- variable names: 



Lift, ■■ Bs{h) ^ A = \n. \i. vaU LITb{n) 

C 5 T -.V xT* ^ A = \{c,t).Xi. vaU CST{c, t) 

Tat? :V^a = \v. m . vaU var{v) 

LAM : T X {V —> A) —>■ A = \{t, e). Xi. let^ I <i= egi{i + 1) in vaU LAM{gi,t, 1 ) 
APP : A X A^A = 

A(ei, 62). Xi. let^ li <= eii in let^ I2 <= 621 in vaU APP{h,l2) 

(These definitions would not be needed in a setting with support for higher-order 
abstract syntax. But one of our goals is to show rigorously that all the variable- 
name manipulations can be done efficiently by the normalization algorithm itself, 
without relying on higher-level operations such as capture-avoiding substitution 
or higher-order matching.) 

Example 3. Let t = ARR{BASE\nt, BASE\,,t)- Then 

Tam {t, Xv^. APP (vARv, uf;„t 3)) 7 

= vaU LAM{g7, t, APP{ VAR{gv), LIT;„Am = vaU gr ($ 3)" 

That is, we can apply an element of A constructed using the wrapper functions 
to a starting index and obtain the representation of a concrete lambda-term. 
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3.2 The Residualizing Interpretation 

We now define a non-standard interpretation = (Sj,0,CJ) of the dynamic 
signature ifd, based on representations of syntactic program fragments and op- 
erations constructing such representations. We will abbreviate as |— ]r- 

For the interpretation of ifd’s types, we take 

Bm = A. 

This allows us to define for any dynamic r a pair of continuous functions, 
often called reification, : |r]r ^ A, and reflection, ■ A^ |r]r, as follows: 

lb = (/(Tn (V^?;)))) 

Tb = AeA e = Ae^. Aal'il'. Tr. ( AP? (e, ^ a)) 

Informally, reification constructs a syntactic representation of a 
“well-behaved” semantic value, while reflection constructs such values from 
pieces of syntax. For the residualizing interpretations of Aid’s term constants 
we now take 



Cd(c.„...,xJ = Ti:a(c., (C^{c, fn , . . . , V])) 

Cd($b) = Ax^^^^A let^ n<^=x in LIT^n 

That is, a general dynamic constant is simply interpreted as the reflection of its 
type-annotated name, while a lifting function forces evaluation of its argument 
and constructs a representation of the literal result. (It is this forcing of static 
subcomputations that may cause the whole specialization process to diverge.) 

Example f. Applying the reification function to the residualizing meaning of a 
term not in static normal form, we obtain: 

([(Ax^. A/^^^. / ($int (x + 1))) 2], 0) 

= J.(— Ij ^ 2, / y>])) 

= i.'— A— (A(p.(p(Cd($int)(Cs(-|-)(vaR 2) (vaR 1)))) 

= ZAM (mt ^ M , Ax'". (( Ay>. p {LIT,„, 3) ) ( VAR v))') ) 

= LAM (ARR(BASEint, BASE-,„t), \v^- APR ( v) 3)) 

And applying this value to 7 as the first bound-variable index gives us precisely 
the normal- form term from Example 3 at the end of the previous section. 



3.3 The Algorithm 

So far, we have looked at a semantic property: from the interpretation of a 
lambda-term in a non-standard denotational semantics of the dynamic signature. 
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we can apparently recover that term’s normal form. But this semantic result 
also forms the basis of an eminently practical normalization algorithm, obtained 
by pulling back the components of the residualizing semantics to the level of 
program syntax. 

We say that a realization ^ of a signature 17 in a programming language given 
by 17pi is a substitution assigning to every type constant of 17, a type over I7pi, 
and to every term constant of 17, a 17pi-term. (For simplicity, we assume that 
the literals of 17’s base types are also literals of the corresponding 17pi-types.) 

Suppose now that 17s C 17pi, Ipi agrees with Xg, and 17pi also has some 
distinguished base types typ and exp with ,Bpi(typ) = T and ;Bpi(exp) = A, as 
well as the associated (strict) constructor constants. Note that |int ^ exp] = A 
(with TV = Zx). Then we can realize the base types of (17s, -Vd) in 17pi by 

= b = int ^ exp 

so that |cr{^^“'}]®p' = |cr]r. Further, for any r, we can define closed 17pi-terms, 

namer : typ reify^ : — > int — > exp reflectr : (int — > exp) — > 

such that |namei-pp>0 = vaV V, |re//y^pp>0 = V, and |ref/ecti-pp>0 = |r- And 
using those, we can define realizations of the term constants from (17s, Ad): 

^ (C(Ti ,...,cr„) = 

^ CST (c, [nameri, ■ • • i '^^^St„])) 

= An. Xi. LIT^n (given Cpi(LIT6) = Xx. let^ n <= a; in vaV LITh{n)) 

so that |i7{<?"'}]^p' = |i7|r. Note in particular that the realizations of static 
base types and constants are exactly the corresponding constructs from the 
programming language. This means that we can even use the usual syntactic 
sugar (such as letrec for applications of fix) in the static parts of programs to 
be specialized. 

We can use this realization to express our normalization algorithm: 

Definition 4 (TDPE). For any dynamic type r, we define the partial function 
TDPE. : {A I E:r}AE\ E:r}by 

TDPEt{E) = E if Evalx^i[reify^ O) = '^E' . 

This TDPE is clearly computable; we will show in Section 4 that it is indeed a 
complete static-normalization function. 

Note that we can view TDPE an instance of “cogen-based specialization” [13], 
in which a “compiler generator” is used to syntactically transform a (binding- 
time annotated) program \- p : S x D ^ R into its generating extension h pi : 
S exp, with the property that for any s : S, Eval{p^ s) = Ps . That is, we 
effectively take 



pi = Xs^.reifyjj_j^{Xd^.p{F’^}{s,d))0 . 
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TDPE shares the general high efficiency of cogen-based PE [12]. Formulating 
the task in terms of static normalization over a binding-time separated signature, 
however, permits a very precise yet concise syntactic characterization of the 
specialized program ps ■ Also, unlike traditional cogens, TDPE does not require 
any binding-time annotation of lambdas and applications in the source program. 

As a further advantage, the signatures and realizations can be very conve- 
niently expressed in terms of parameterized modules in a Standard ML-style 
module system. The program to be specialized is simply written as the body 
of a functor parameterized by the signature of dynamic operations. The functor 
can then be applied to either an evaluating (l?®) or a residualizing (^’') structure. 
That is, the cogen pass does not even require an explicit syntactic traversal of 
the program, making it possible to enrich the static fragment of the language 
(e.g., with pattern matching) without any modification to the partial evaluator 
itself. 

It is also worth noting that the r-indexed families above can be straightfor- 
wardly defined even in ML’s type system: consider the type abbreviation 

tdpe{a) = typ x (a ^ int ^ exp) x ((int ^ exp) ^ a) . 

Then for any dynamic type r, we can construct a term of type tdpe(r{^' }) whose 
value is the triple {namer, reify^, reflectT-). We do this by defining once and for 
all two ML-typable terms 

base : tdpe{\nt — > exp) arrow : Va, f3. tdpe{a) x tdpe{f3) — > tdpe{a /3 ) , 

with which we can then systematically construct the required value. The tech- 
nique is explained in more detail elsewhere [18]. 

Finally, the dynamic polymorphic constants (e.g., fix) now take explicit rep- 
resentations of the types at which they are being instantiated as extra argu- 
ments. In the evaluating realization, these extra arguments are ignored; but the 
residualizing realization uses them to construct the name-reflect-reify triple for 
given corresponding triples for ti, . . . , r„. 

3.4 Applications 

Despite its apparent simplicity, TDPE has been successfully used for several non- 
trivial examples; see Danvy’s tutorial for an overview [6]. Many of these actually 
use the slightly more complicated call- by- value version [4] (see Section 5.4). 
Because it exploits the highly-optimized evaluation mechanism of a functional 
language, such a partial evaluator is typically much faster than one representing 
and manipulating the source program as an explicit value. 

Let us just mention here that in addition to stand-alone, source-to-source 
PE, the TDPE framework can be particularly naturally employed as a “seman- 
tic back-end” for executable language specifications. That is, if we explicitly 
parameterize such a specification by the signature of runtime operations (in- 
cluding conditionals, fixed points, etc.), we can instantiate this signature with 
either the runtime realization, yielding an interpreter, or with the residualizing 
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signature, yielding a compiler [8,11]. Amusingly, the specializer does not even 
need the actual text of the specification, only its representation as an already 
compiled module. 

4 Showing Correctness 

In this section, we sketch a correctness proof for the TDPE algorithm, i.e., 
that it computes static normal forms when they exist. For the case without 
static constants, essentially the same algorithm can actually be extracted directly 
from the standard (syntactic) proof of strong normalization for the simply typed 
lambda-calculus [1]; but it is not clear if this approach can be extended to a richer 
programming-language setting. 

Instead, our proof uses the technique of semantic logical relations, structured 
similarly to Gomard and Jones’s proof of Lambda-mix [14, 8.8], but accounting 
more rigorously for potential divergence and for the generation of “fresh” variable 
names. It also admits a richer type structure for the dynamic language. (We can 
still treat the untyped variant as a special case; see Section 5.1.) 

4.1 Properties of the Term-Family Representation 

Let some evaluating dynamic interpretation of Aj be given. We abbreviate 
|_j^ |_jis,2:d g^g |_j^ g^g before). 

Definition 5 (partial meaning relation, ;^). For any A, let 

j]Z\ = max({i -|- 1 j G dom Z\} U {0}) 

(so if i > (,A then gi ^ dom A). Then for any A, t, s G {nf,at} (as used in 
Definition 2), 5 G e G A, and a G we define a relation by 

J >-^ a Vi > (,A.ei = A\/3E.ei = vaP ^Ff AA E : rA|i?p'i(5 = a 

This roughly expresses that “|e]J = a”, but taking into account variable renam- 
ing, partiality, and simplification: for all sufficiently large starting indices i, if 
e i converges, it must represent a normal-form term with the right meaning. We 
check that this relation is semantically well behaved: 

Definition 6 (admissibility). We say that a relation R C A x A' between 
two pointed epos is admissible (or inclusive^ if it is chain- complete (i.e., for 
all chains {ai)i and (o'),, if \/i. (ai,a'f) G R then also (Ui Ui ^ R) and 
pointed (i.e., {Ea,Ea') G R). 

Lemma 1 (a is admissible). For any A, 5 G r, and s G {nf,at}, the 

relation {(e, a) \ e <5 >-^ a} is admissible. 

Proof. Straightforward, noting that admissible relations are closed under arbi- 
trary intersection, and that any chain in A± is eventually constant. □ 
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Although the output of TDPE is a closed program, we still need to account for 
the typing and meaning of open program fragments as they are being constructed 
and put into context: 

Definition 7 (Kripke structure). A world is a pair {A,S) where S G 
Such worlds are partially ordered by 

{A\5') > (A, (5) Mx € dom A. A'(x) = A{x) f\ 5' x = 5x 

Lemma 2 {>■ is Kripke). If e S a and {A', S') > {A,S), then also 
e@^' S' a. 

Proof. Follows easily from standard weakening properties of the typing relation 
(if A h A : r then A' \- E : t) and denotational semantics ( l-E] S' = |A] ^ ) . □ 



Lemma 3 (meanings of term families). The wrapper functions have the 
following properties: 

1. If A{v) = T then VARv S Sv. 

2. If n G Bs{b) then LIT^n®^ S y-f" vaD n 

3. OT(c, fn , . . . , V]) <5 

4- If for all V ^ dom A and a G ev i-^ «] fa 

then LAM{Wi,e) S /. 

5. If ei (5 f and C 2 S a then APP (ei, 62 ) S y!):* fa 

Proof. Straightforward verification in all cases. (For case 4, we exploit the fact 
that |ri]^d is always non-empty; this shortcut can be avoided by using a slightly 
more complicated world structure throughout the proof.) □ 

4.2 Soundness of TDPE 

We prove soundness by formally relating the standard and the residualizing 
interpretations of types and terms. 

Definition 8 (logical relation, ~cr). For any type a and world (A,S), we 
define a relation a S a' , where a G |cr]r and a' G |cr]e by: 

n (5 n' n = n' 

e S n' e S n' 

f S f ^ V(A', S') > (A, <5). 

\/a,a'.a@^ S' a,' ^ f a@^ S' ^a 2 f 



We first check the standard requirements: 



390 Andrzej Filinski 



Lemma 4 (~cr is admissible). For any a, the relation {(a, a') \ a@^ S a’} 
is admissible. 

Proof. Simple induction on a. For b, use admissibility of (Lemma 1). □ 

Lemma 5 (~cr is Kripke). If e 6 a and {A', 6') > then also 

e@^' S' a. 

Proof. This is a standard result about Kripke logical relations; the proof is by a 
simple induction on cr, using Lemma 2 for the base case a = b. □ 

We obtain our main correctness result from two lemmas: 

Lemma 6 (soundness, type part). For any dynamic type r, 

1. Ifa@^Sr^ra' t/ien (5 o'. 

2. Ife@^S yf a' then tr e a'. 

Proof. Straightforward induction on r, using the properties of the wrapper func- 
tions from Lemma 3. □ 

Lemma 7 (soundness, term part). Let {A, 6) be a world, and let p € |T]r 
and p' € |T]e. Then for any well-typed term F E : a, if \/x G 

domT. px S ~_r(x) p' x then |i?]rp S |£']ep'. 

Proof. This is the usual Kripke logical relations lemma, proved by straightfor- 
ward induction on E. The only non-standard case is that of if = c,.^ for 
which we need Lemma 6(2). For E = fixo-, we use fixed-point induction, i.e.. 
Lemma 4 together with the chain-based construction of Cs(fixo-) in Definition 1. 

□ 



Theorem 1 (soundness). TDPE is a static-normalization function. 

Proof. Observe first that TDPEt.{E) = iff |'^(|iJ]r0)O = vaK '^E' . Now, by 
Lemma 7, since empty environments are vacuously related, |if]r0@ 0 [£’]e0- 

And thus by Lemma 6(1), (|A]r0) @ 0 [A’]e0, which, by the definitions of 

and jj, gives us precisely that E : r and 

For the second part, if E' E then in particular lif'Jr = lAjp. And thus 
by the observation above, we must have TDPEt.{E') = TDPEr{E). □ 

4.3 Completeness of TDPE 

To supplement the above partial-correctness result, we can also show that if a 
suitable E exists, the algorithm will actually find it. This proof uses a much 
simpler logical-relation argument, capturing the intuition that the algorithm 
necessarily converges when applied to a term containing no static constants: 
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Definition 9 (totality predicate). For any dynamic type t, we define a pred- 
icate Tr C |rpd by 

Tt = {eGA\ Vi. ei _L} 

= {/ G [rifS ^ Ir^fl | V« G T.,. /« G T.J 

As before, we then obtain the result from two main lemmas: 

Lemma 8 (completeness, type part). For any dynamic type t, 

1. If a G T-r then for all i > 0, a i _L. 

2. If for all i>0, eif^F then trS G T.r- 

Proof. Straightforward induction on r, by inspection of the definitions of the 
wrapper functions. □ 

Lemma 9 (completeness, term part). Let 5 G Then for any well- 

typed term A E : t, if\/xG dom A.SxG 2Vi(a;) then G Tr- 

Proof. Standard induction on E, using Lemma 8(2) for dynamic constants. □ 

Theorem 2 (completeness). TDPE is a complete static-normalization func- 
tion. 

Proof. Suppose E : r has the static normal form E : r. Then in 

particular |if]r = By Lemma 9, g T^, and thus by 

Lemma 8(1), |^(|A]r0)O = (|.EpS 0) 0 _L, so TDPEr(E) is defined. □ 

5 Variations and Extensions 

5.1 Lambda-Mix 

We can use the previous results to show correctness of partial evaluation for 
languages like the one used for Lambda-mix [14, 8.8]. Here, the dynamic language 
is untyped. Or, more precisely, it has a single type d of dynamic values, and 
operators: 

F,x:dh E :d F G Ei : d F G E 2 : d 
FG Xx.E :d FG Ei@E 2 : d 

To model this in our typed framework, we let the dynamic signature Ad 
contain the single base type d, and constants (f> : (d ^ d) — > d and if : d — > d — > d. 
We can then treat dynamic lambda-abstraction and application as abbreviations: 

\x. E = tf{\x-.E) and Ei@E 2 = ifEiE 2 

In the evaluating dynamic semantics X| of Ad, the type constant d is inter- 
preted as a solution to the domain equation 
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and 4> and ip as the evident embedding and projection functions for the first 
summand. In the residualizing interpretation on the other hand, d is the 
domain of syntactic term families, and the constants are reflected according to 
their types, as usual. In particular, reify^^ is simply the identity. 

The soundness result for TDPE then gives us that if E : d and 

Evalj^^x^{E) = 'if' then E : d and Evalx^{E) = Evalx^^x^{E). With a little 
more work we also obtain a similar statement for non-closed terms. 

Note finally that normalizing a term of type d a priori yields a simply-typed 
term over rather than an untyped one. In a static normal form, however, any 
occurrence of the constant (p will be applied to a syntactic lambda-abstraction, 
and Ip will be applied to two arguments. Thus, the output of the partial evaluator 
can always be directly expressed in terms of the underlined abstraction and 
application operators. 

5.2 Gensym-Like Name Generation 

The term-family representation from Section 3.1 constructs terms in which the 
names of bound variables are derived from the number of enclosing lambdas; 
this convention is sometimes known as de Bruijn levels (not to be confused with 
de Bruijn indices). Although this is probably the simplest choice in a purely 
functional setting, there is nothing canonical about it. To more precisely capture 
the informal concept of newly-generated “fresh” variable names, we could instead 
take: 

A = Af ^ {Ax Af)x 
VAR = Xv.Xi.^ral^ {VAR{v),i) 

LAM = X{t, e). Ai.let^ {I, i') egi {i -I- 1) in vaE {LAM{gi,t, l),i') 

APR = A(ei,e 2 ). Ai.let^ {h,i') eif in let^ {hA”) ^ ^^i' in vaP {APP{li,l2),i”) 

(with analogous extensions for constants and literals). This scheme generates 
terms in which all bound- variable names are distinct. Then, after changing the 
conditional meaning relation to read: 

e 5 a 

Vi > jjZ\. ei = T V 3E, i! >i.ei = val^ ^ *0 b V\ E : x A |if]^ = a. 

we can check that Lemma 3 still holds, and therefore all the remaining construc- 
tions and proofs go through without further modifications. 



5.3 On-Line Type-Directed Partial Evaluation 

Although TDPE generally works on binding-time separated signatures, it is ac- 
tually possible to give an on-line formulation, in which it is not necessary to 
explicitly annotate all base types and operations. Conceptually, we instead take 
;Bj(&) = Af— > (Bs{b) -I- A)j_. (In practice, when A is a conveniently inspect able 
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type, it suffices to take = A and explicitly recognize A- values that rep- 

resent literals.) The arithmetic operators then produce a static result if if both 
arguments are static, and dynamic otherwise, possibly coercing one argument in 
the process. A similar extension works for conditionals. 

This scheme enables “opportunistic” simplifications, in cases where an 
operand is sometimes, but not always, statically known (or where a static anal- 
ysis cannot prove that it is known). Note, however, that we must still annotate 
occurrences of fix as static or dynamic, or otherwise prevent fruitless infinite 
expansion of a recursive function. For example, it is often possible to explicitly 
identify a particular function parameter as the one controlling the recursion, and 
only unfold calls in which that argument is a literal [5]. 

Of course, the price of these potential improvements is that the amount of 
simplification is less predictable: the output will still be in long /Jyy-normal form, 
but it is no longer evident from the original program which operations will be 
performed statically, and which ones must remain in the specialized program. 

5.4 Call- by- Value and Effects 

For practical applications, a call-by-value variant of TDPE is usually preferable, 
and indeed the technique was first presented in this setting [4]. Let us briefly 
sketch the necessary changes from the call-by-name case here. 

To give a denotational semantics of an ML-like language, we consider an in- 
terpretation I to also explicitly include a monad for modeling effectful computa- 
tions. We usually want to be able to use any monad in the evaluating interpreta- 
tion; thus the notion of static equality must be safe for any dynamic effect. That 
is, instead of computing normal forms based on the strong /3?7-lambda-calculus, 
we now need a normalization-by-evaluation algorithm for Moggi’s computational 
lambda-calculus Ac [15]. 

Fortunately, much as a single residualizing interpretation of dynamic type 
and term constants suffices to compute call-by-name static normal forms sound 
for any dynamic interpretation, it turns out that a single “maximally general” 
residualizing interpretation of effects can be used to compute call-by-value nor- 
mal forms suitable for any dynamic monad. 

A particularly natural such residualizing monad is that of continuations with 
answer type A, which can be straightforwardly related to any dynamic monad for 
the purpose of the logical relation in Section 4.2. Moreover, we can still construct 
the corresponding residualizing realization <1>^ , as long as our programming lan- 
guage contains Scheme-style first-class continuations and state [10]. Incidentally, 
this construction also allows disjoint-union types (sums) to be naturally added 
to the language. The details are still under investigation, however, and will be 
reported in a forthcoming paper. 

6 Conclusions and Future Work 

We have given an account of type-directed partial evaluation that separates 
the specification of the problem (computation of static normal forms) from its 
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implementation (normalization by evaluation) ; in previous work these tended to 
be intertwined. We also presented a correctness proof for the implementation, 
using logical relations over a simple denotational model of the binding-time 
separated language. To keep the details manageable, we restricted our scope to 
a purely functional language; but both the algorithm and the proof techniques 
extend to call-by-value languages with effects as well. 

Future work falls in two classes. First, there are a number of natural exten- 
sions to the framework and results in essentially the form they are presented 
here. In addition to the directions already mentioned in Section 5, one can also 
consider polyvariant specialization, run-time code generation, and other classical 
PE concepts in the context of TDPE. 

Second, it would be interesting to investigate how TDPE relates to more 
general work on linguistic support for staged computation, especially recent 
developments based on modal logics [9,16]. For example, it might be possible 
to generalize the notion of static normalization to such settings, and consider 
normalization-by-evaluation algorithms for type systems more expressive than 
simple types. 
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Abstract. Civen a program P, an unfold/fold program transformation 
system derives a sequence of programs P = Po, Pi, • • • , Pn, such that 
Pi+i is derived from Pi by application of either an unfolding or a folding 
step. Existing unfold/fold transformation systems for definite logic pro- 
grams differ from one another mainly in the kind of folding transforma- 
tions they permit at each step. Some allow folding using a single (possibly 
recursive) clause while others permit folding using multiple non-recursive 
clauses. However, none allow folding using multiple recursive clauses that 
are drawn from some previous program in the transformation sequence. 
In this paper we develop a parameterized framework for unfold/fold 
transformations by suitably abstracting and extending the proofs of ex- 
isting transformation systems. Various existing unfold/fold transforma- 
tion systems can be obtained by instantiating the parameters of the 
framework. This framework enables us to not only understand the rel- 
ative strengths and limitations of these systems but also construct new 
transformation systems. Specihcally we present a more general trans- 
formation system that permits folding using multiple recursive clauses 
that can be drawn from any previous program in the transformation se- 
quence. This new transformation system is also obtained by instantiating 
our parameterized framework. 



1 Introduction 

Some of the most extensively studied transformation systems for definite logic 
programs are the so called unfold/fold transformation systems. At a high level 
unfold and fold transformations can be viewed as follows. Definite logic pro- 
grams consist of definitions of the form A:— (j) where A is an atom and </) is a 
positive boolean formula over atoms. Unfolding replaces an occurrence of A in 
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a program with cj) while folding replaces an occurrence of cj) with A. Folding is 
called reversible if its effects can be undone by an unfolding, and irreversible 
otherwise. An unfold/fold transformation system for definite logic programs was 
first described in a seminal paper by Tamaki and Sato [20]. In the flurry of re- 
search activity that followed, a number of unfold/fold transformation systems 
were developed. Kanamori and Fujita [8] proposed a transformation system that 
was based on maintaining counters to guide folding. Maher described a system 
that permits only reversible folding [10]. The basic Tamaki-Sato system itself was 
extended in several directions (e.g., to handle folding with multiple clauses [7], 
negation [1,18,19]) and applied to practical problems (e.g., [2,3,12]). (See [11] for 
an excellent survey of research on this topic over the past decade). 

Correctness of Unfold/Fold Transformations Correctness proofs for unfold/fold 
transformations consider transformation sequences of the form P{),Pi, . . . , where 
Pq is an initial program and Pi+i is obtained from Pi by applying an unfolding 
or folding transformation. The proofs usually show that all programs in the 
transformation sequence have the same least Herbrand model. It is easy to verify 
that transforming Pi to using unfolding or folding is partially correct, i.e., 
the least model of P^+i is a subset of that of Pi. It is also easy to show, by 
induction on the structure of the proof trees, that unfolding transformation is 
totally correct, i.e., it preserves the least model. However, as illustrated below, 
indiscriminate folding may introduce circularity in definitions, thereby replacing 
finite proof paths with infinite ones. 

Consider the sequence of programs in Figure 1. In the figure. Pi is derived by 
unfolding the occurrence of q(X) in the first clause of Pq. Pj is derived from Pi by 
folding the literal q(X) in the body of the second clause of predicate p into p(X) 
using the clause p(X) : - q(X) in Pq. Alternatively, consider the transformation 
sequence in figure 2. By folding q(X) in the second clause of p in Pi (using the 
second clause defining q in Pi), we obtain program P^. Now folding q(X) in the 
second clause of q in P2 (using second clause of p in Pi), we get program P3, 
whose least model differs from that of Pq. 

Transformation Systems with Irreversible Folding If the folding transformation 
is reversible, then since its effect can be undone by an unfolding, any partially 
correct unfold/fold transformation sequence is also totally correct. However, for 
reversibility, folding at step i of the transformation can only use the clauses in Pi. 



p(X) :-q(X) . 
q(a) . 

q(f(X)):-q(X). 



P'.CLy . 

p(f(X)):-q(X). 




p(a) . 

p(f(X)) :-p(X) . 
q(a) . 

q(f (X)) :-q(X) . 



Program Pq Program Pi 



Program P 2 



Fig. 1. An example of correct unfold/fold transformation sequence 



398 Abhik Roychoudhury et al. 



Therefore reversibility is a restrictive condition that seriously limits the power of 
unfold/fold systems by disallowing many correct folding transformations, such as 
the one used to derive P 2 from Pi. Hence almost all research on unfold/fold trans- 
formations have focused on constructing systems that permit irreversible folding. 
In such systems folding at step i can use clauses that are not in Pi. For example, 
in the original and extended Tamaki-Sato systems [20,21] folding always uses 
clauses in Pq whereas in the Kanamori-Fujita system [8] the clauses can come 
from any Pj (j < i). But ensuring total correctness of irreversible transformation 
sequences is difficult. In order to ensure that folding is still totally correct, these 
systems permit folding using only clauses with certain (syntactic) properties. For 
instance, the original Tamaki-Sato system permits folding using a single clause 
only {conjunctive folding) and this clause is required to be non-recursive. In [7] 
the above system was extended to allow folding with multiple clauses {disjunc- 
tive folding) but all the clauses are required to be be non-recursive. Kanamori 
and Fujita [8] as well Tamaki and Sato in a later paper [21] gave two different 
approaches for conjunctive folding using recursive clauses. But the design of a 
transformation system that allows folding in the presence of both disjunction 
and recursion has remained open so far. We will describe such a system in this 
paper. 

To generalize in this direction one needs to first understand the strengths 
and limitations of the above systems. The key observation is that, although the 
book-keeping needed to determine permissible foldings appear radically different 
in the different systems, there is a striking similarity in how the transformations 
are proved correct. Essentially, these systems associate some measure with differ- 
ent program elements, namely, atoms and clauses to determine whether folding 
is permissible in that step (e.g., “foldable” flag in [20], descent levels/strata num- 
bers in [21], and counters in [8]). Moreover, they ensure that each transformation 
step maintains an invariant relating proofs in the derived program to the vari- 
ous measures (e.g., the notions of rank-consistency in [8,20], weight-consistency 
in [7] and /r-completeness in [21]). This raises another interesting question: can 
we exploit the similarities in the correctness proofs of irreversible unfold/fold 
systems to develop an abstract framework. Such a framework will specify the 
obligations that must be satisfied to ensure total correctness and hence can sim- 
plify construction of unfold/fold systems to the extent that one is relieved of the 
burden of giving correctness proofs. We propose such a framework in this paper. 




p(f(X)):-q(X). 




p(a) . 

p(f(X)):-q(f(X)). 
q(a) . 

q(f (X)) :-q(X) . 



p(a) . 

p(f(X)):-q(f(X)). 
q(a) . 

q(f(X)) :-p(f (X)) . 



Program Pq Program Pi 



Program 



Program Pj 



Fig. 2. An example of incorrect unfold/fold transformation sequence 
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Summary of Results In this paper, we develop a general transformation 
framework for definite logic programs parameterized by certain abstract mea- 
sures by suitably abstracting and extending the measures used in [7,8,20,21] (see 
Section 2) . We relax the invariants needed in the proofs to permit approximation 
of measure values. This is the key idea that enables us to fold using multiple 
recursive clauses. We prove the correctness of transformations in the framework 
based only on the properties of the abstract measures. We show that various 
existing unfold/fold transformation systems can be derived from the framework 
by instantiating these abstract measures (see Section 3). We also show how the 
framework can be extended to include the Goal Replacement transformation 
(see Section 4). 

The parameterized framework presented in this paper is useful for under- 
standing the strengths and limitations of existing transformation systems. It 
also enables the construction of new unfold/fold systems. As evidence we obtain 
SCOUT (Strata and COunter based Unfold/fold Transformations), a transfor- 
mation system that permits disjunctive folding using recursive clauses. The de- 
velopment of SCOUT was based on two crucial observations made possible by 
the framework. First, when instantiating the framework to obtain the Kanamori- 
Fujita system, it is easy to see that the counters (the measure used in their 
system) may come from any linearly ordered set; this permits us to incorporate 
stratification into the counters to obtain a system that generalizes the extended 
Tamaki-Sato system [21] as well as the Kanamori-Fujita system. Secondly, the 
framework enables us to maintain approximate counters; we can hence generalize 
the combination of the Kanamori-Fujita and the extended Tamaki-Sato systems 
to fold using multiple recursive clauses. 

2 A Parameterized Transformation Framework 

We now describe our parameterized unfold/fold transformation framework and 
illustrate the abstractions by drawing analogies to the Kanamori-Fujita system. 

We assume familiarity with the standard notions of terms, models, substitu- 
tions, unification, most general unifier (mgu), definite clauses, SLD resolution, 
and proof trees [9]. We will use the following symbols (possibly with primes 
and subscripts): P to denote a definite logic program; M{P) its least Herbrand 
model; C and D for clauses; A, B to denote atoms and literals and a for mgu. 



2.1 Unfolding and Folding 

The unfolding and folding rules are defined as follows: 

Rule 1 (Unfolding) Let C be a clause in Pi and A an atom in the body of C. 
Let Cl , . . . , Cm be the clauses in Pi whose heads are unifiable with A with most 
general unifier CTi, . . . , am- Let C' be the clause that is obtained by replacing 
Aaj by the body of Cjaj in Caj (1 < j < m). Assign {Pi — {C}) U{C(, . . . , C'm} 
to Pi+i. □ 
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Rule 2 (Folding) Let {Ci, . . . , Cm} C Pi where Ci denotes the clause 
A:- Ais, , Ai^m ,A[,... , and {Di, ... , Dm] C Pj (j < i) where Di is the 
clause Bi'.— Bi,i, ■ ■ ■ ,Bi^m - Further, let: 

1. yi < I < m 3ai yi < k < ni Ai^k = Bi^kO'i 

2. Biai = B 2 CF 2 = • • • = Bm(Jm = B 

3. Di,. . . , Dm are the only clauses in Pj whose heads are unifiable with B. 

4. VI < Z < TO, ai substitutes the internal variables^ of Di to distinct variables 
which do not appear in {A, B,A},.-- A'„}. 

Then P,+i := (P - {Ci, . . . , Cm}) U {C} where C = A:- P, . . . , □ 

Di, . ■ ■ 1 Dm are the folder clauses, Ci, . . . , Cm are the folded clauses, and B is 
the folder atom. A folding step is eonjunctive whenever both the folder and folded 
clauses are singleton sets and is disjunetive otherwise. Note that in the latter step 
a set of folded clauses is simultaneously replaced by a single clause using a set 
of folder clauses. We say that Pq, Pi, . . . , P„ is an unfold/fold transformation 
sequence if the program is obtained from Pi (i > 0) by application of 

an unfold or a fold rule. Partial correctness of an unfold/fold transformation 
sequence (Theorem 1) is established by showing that a proof T of any ground 
atom A G M(Pi+i), has a corresponding proof T' in Pi. This can be proved by 
induction on the structure of T. 

Theorem 1 (Partial Correctness) Let Pq,Pi, . . . ,Pi be a program transfor- 
mation sequence where M{Pj) = M{Pq) for all 0 < j < i. If P^+i is obtained 
from Pi by applying either unfolding or folding, then M(Pi_|_i) C M^Pf). □ 

2.2 Measures, Measure- Consistent Proofs and Total Correctness 

Total correctness of an unfold/fold transformation sequence is established by 
inducting on some well-founded order to construct a proof in Pi+i for any atom 
A in M{Pi). To see the subtleties in showing total correctness, consider trans- 
forming Pi to Pi+i using a conjunctive folding step. To construct a proof of A 
(the head of the folded clause) in Pi+i, we need a proof of B (the folder atom) 
in Pi+i. But the existence of such a proof can be established (by induction hy- 
pothesis) only if B is less than A in the well-founded order on which we are 
inducting. Note that if the folder clause is picked from Pj, j < i, we cannot use 
simple well-founded orders like size of proof trees in Pi, since proof of B in Pi 
can be larger in size than the proof of A in P^. Here we develop an abstract 
formulation of certain well-founded orders (which we call measures) on which 
we can induct to establish total correctness. 

It is worth noting that we do not attempt to translate every proof of A in 
Pi to a proof of A in P^+i. Instead, following [8,20,21] we consider a “special 
proof” called strongly measure- consistent proof (see Definition 6) of A in Pi 
and construct a proof of A in P^+i. The induction proof for establishing total 
correctness is completed by showing that the proof of A in P^+i thus constructed 
is itself strongly measure consistent. 



^ Variables appearing in the body of a clause, but not its head 
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Recall that irreversible folding steps need to be constrained in order to pre- 
serve the semantics. In order to enforce these constraints, we maintain some 
book-keeping information as we perform the transformations, formalized using 
the following notions of Measure structure, Atom measure, and Clause measure. 



Definition 1 (Measure Structure) A Measure Structure is a 4~tuple = 
(AI,0, A,W) where (Ad,©) is a commutative group with 0 € Ad as its identity 
element, -< is a linear order on Ad, © is monotone w.r.t. and W is a subset 
o/ {x S Ad I 0 ^ x}, over which -< is well-founded. 

We will refer to Ad, the first component of the measure structure, as the measure 
space. We let ^ denote ^ or =. Moreover, we use © to denote the inverse 
operation of the group (Ad, ®). We also use © as a binary operator, aQb meaning 
a® (©6) (where (©&) is the inverse of b). The Kanamori-Fujita system [8] keeps 
track of integer counters. Thus the measure structure is (Z, +, <,N), where Z 
and N are the set of integers and natural numbers respectively, + denotes integer 
addition, and < is the arithmetic comparison operator. 

Definition 2 (Atom Measure) An atom measure a of a program P w.r.t. a 
measure structure p, is a partial function from the Herbrand base of P to W such 
that it is total on the least Herbrand model of P. For our purposes, it suffices to 
use the same atom measure for each program in a transformation sequence. 

In the Kanamori-Fujita system, the atom measure of any Pi in the transforma- 
tion sequence is the number of nodes in the shortest proof tree of A in the initial 
program Pq. The proof of total correctness for folding will induct on the atom 
measure, relating the atom measure of A (the head of the folded clauses) with 
the atom measure of B (the folder atom). 

Definition 3 (Clause Measure) A clause measure {'jio,Jhi) of a program P 
w.r.t. a measure structure jj. is a pair of total functions from clauses of P to Ai 
such that VC G P ^io(C) ^ ^hi{C). 

In the Kanamori-Fujita system, yjo and are the same and map each clause to 
its corresponding counter value. However, as we will see later, to allow disjunctive 
folding we will need the two distinct functions yjo and ^hi- Henceforth, we denote 
the clause measure of a program Pi by {lio-,lhi)- We will now develop the idea 
of “special proofs” mentioned earlier. For that purpose, we need the definition: 

Definition 4 (Ground Proof of an Atom) Let T be a tree, each of whose 
nodes is labeled with a ground atom. Then T is a ground proof in program P, if 
every node A in T satisfies the condition : A\— Ai,...,A„ is a ground instance 
of a clause in P, where A\, ...,A„ {n > 0) are the children of A in T. 

Consider transforming Pi to Pi+i by a folding step (see figure below). C and D 
are the folded and folder clauses respectively and j <i. 
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D : 



q:— qi, q^ 



C : 



p:— qi, qit, qit+i, qn 



C : 



P- q^ qk+i; qn 



Program Pj 



Program Pi 



Program P^+i 



In order to show that p S M(Pi) p S M(Pi+i) by induction on we 
would like to show that a(q) ^ 0!(p). The atoms p and q are related by what is 
shared between the bodies of the clauses C and D. Hence we attempt to relate 
their measures via the measures of bodies of C and D. Suppose D satisfies : (i) 
c*(q) ^ Si<i<fcO!(qi), then we can relate a(q) to the sum of the measures of 
the body atoms of the folded clause C (since k < n). Further if C satisfies : (ii) 
Q^(p) ^ Si<i<nQ^(qi)) then we can establish that a(q) ^ a(p). If either (i) or 
(ii) is a strict relationship then we can establish that a(q) A a(p). Relations (i) 
and (ii) form the basis for the notions of weak and strong measure eonsisteney . 

Definition 5 (Weakly Measure Consistent Proof) A ground proof T in 
program Pi is weakly measure consistent w.r.t. atom measure a and clause mea- 
sure {lio^lhi) if every ground instance A\— Ai, ...,A„ of a clause C G Pi used 
in T satisfies a{A) ^ Yhi(C) ® Ei<z<n 



Definition 6 (Strongly Measure Consistent Proof) A ground proof T in 
program Pi is strongly measure consistent w.r.t. atom measure a and clause 
measure (7jo,7^j) if every ground instance A\— Ai,...,An of a clause C G Pi 
used in T satisfies \/l <l<n a{Ai) ^ a{A) and a{A) P 7;o(^) ® Si<;<n 



Definition 7 (Measure Consistent Proof) A ground proof T in program 
Pi is said to he measure consistent w.r.t. atom measure a and clause measure 
i'Jiot'yhi)’ if O' i^ strongly and weakly measure consistent w.r.t. a and {‘^lo^lhi)- 

We point out that our abstract notion of measure consistency relaxes the con- 
crete notion of rank consistency of [8]. While rank consistency of [8] imposes 
a strict equality constraint on a{A), measure consistency only bounds it from 
above and below. As we will show later, this facilitates maintenance of approx- 
imate information. This is the central idea that permits us to do disjunctive 
folding using recursive clauses. For proving total correctness, we need : 

Definition 8 (Measure consistent Program) A program P is measure con- 
sistent w.r.t. atom measure a and clause measure if for all A G M(P), 

we have : (1) All ground proofs of A in P are weakly measure consistent w.r.t. a 
and f^ioT^ihi) (2) has a ground proof in P which is strongly measure consistent 
w.r.t. a and 

We are now ready to define the abstract conditions on folding and constraints on 
how the clause measures are to be updated after an unfold/fold step. For each 
clause C obtained by applying an unfold/fold transformation on program Pi, we 
derive a lower bound on 7 ^^^(C) and an upper bound on 7 ;^^(C'), denoted by 
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GLB^^^{C) and {C) respectively. We will see later that the conditions 

on when the rules become applicable, as well as these bounds will be based on 
the requirements of the proof of total correctness. 

We assume that for any atom A (not necessarily ground), amin(A) denotes 
a lower bound on the measure of any provable ground instantiation of A i.e. 
ajnin{A) ^ a{A6). We use amin in the folding condition of rule 4 below. 

Rule 3 (Measure Preserving Unfolding) Let Pi+i be obtained from Pi by 
an unfolding transformation as described in Rule 1. Then, VI < j < m 

^ GT5*+1(C') = jUC) © (1) 

^ LUB^^\C') = Ym{C) © 7L(C',) (2) 

The clause measure of all other clauses in Pi+i are inherited from Pi. □ 

Rule 4 (Measure Preserving Folding) Let Pi+i be obtained from Pi by a 
folding transformation as described in Rule 2, such that VI < I < m. jli{Di) -< 
iloiCi) © E l<k<n ^rnin (W,).2 Then, 

7 E'(C') ^ GLB^+\C') = min ( © yL(A) ) (3) 

l<l<m 

llViC') h LUB^+\C) = max (yL(Q) © 7/„(A)) (4) 

l<l<m 

and the clause measure of all other clauses in Pi+\ are inherited from Pi. □ 

It should be noted that the above rules do not prescribe unique values for upper 
and lower clause measures for the clauses generated by the transformations. 
Instead, they only specify bounds of these values; the values themselves are 
chosen only when instantiating the framework to a concrete system. 

Observe from the definition of atom measures that we can always assign 0 to 
o-min- However, by setting a more accurate estimate of amin, we can allow more 
folding steps. As an example, consider any conjunctive folding step where the 
folded clause C G Pi has more body atoms than the folder clause D G Pj, and 
Jio(G) = 7 ^i(U). Such a folding step will not be allowed if VA amin{A) = 0 . 

The Need for Approximate Glause Measures : In the Kanamori-Fujita system, a 
counter (corresponding to our clause measure) is associated with every clause. 
Roughly speaking, the counter associated with a clause C € Pi where C = 
A:— Ai , . . . , An indicates the number of interior nodes in the smallest proof tree 
in Pq that derives Ai, . . . ,An from A. Thus, it is the amount saved (in terms 
of proof tree size, compared to the smallest proof in Pq) whenever C is used in 
a proof in Pi . The folding rule is applicable provided the savings accrued in the 
folded clause is more than that in the folder clause. 

To see why a single counter is inadequate for disjunctive folding, consider the 
following example: 

^ Intuitively, if the clause measure of Ci “exceeds” the clause measure of Di then we 
can fold Ci using Di. 
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Pi+i is obtained from Pi by folding {C^.C^} into {Ci,C 2 \- Now, the savings 
due to C in a proof of depends on whether C 3 or C 4 is used to resolve q 
in that proof. Since this information is unknown at transformation time, we can 
only keep approximate information about savings. In our framework we choose 
to approximate the savings by the closed interval [7io,7?ii]- 

We now have the necessary machinery for establishing total correctness of a 
sequence of unfold/fold transformations. 

Lemma 1 (Preserving Weak Measure Consistency) Let Pq,. . . ,Pi be a 

transformation sequence of measure consistent programs such that M{Pq) = 
M(Pj) for all 0 < j < i. Let Pi+i he obtained from Pi by applying measure- 
preserving unfolding or measure-preserving folding. Then, all ground proofs of 
Pi+i are weakly measure consistent. 

Proof Sketch. The proof proceeds by induction on the size of ground proofs 
of Pi+i. Let T be a ground proof of some ground atom A in Pi+i, and let 
A:— Al, ...,An (where n > 0) be the ground instance of a clause C G Pi+i that 
is used at the root of the proof T. Then the subproofs of Ai,...,An in T are 
weakly measure consistent by induction hypothesis. 

Hence, it suffices to show that, a{A) ^ 7^1^(C') ® show 

this, we consider three cases: (1) C was inherited from Pi. (2) C was obtained 
from Pi by unfolding; and (3) C was obtained from Pi by folding. In each of these 
three cases, we can show the above inequality by assuming M(Pi+i) C M{Pi) 
(which follows from theorem 1 ). □ 

Theorem 2 (Total Correctness) Let Pq,Pi,... ,Pi be a transformation se- 
quence of measure consistent programs such that M{Pq) = M{Pj) for all 0 < 
j < i. Let Pi+i be obtained from Pi by applying measure-preserving unfolding 
or measure-preserving folding. Then, (i) M(Pi_|_i) = M{Pi) and (ii) P^+i is a 
measure- consistent program. 

Proof. By theorem 1, we have M{Pi+i) C M{Pi), and by lemma 1 we know that 
all ground proofs of Pi+i are weakly measure consistent. Hence it is sufficient 
to prove that (1) M{Pi) C M(Pi+i) and (2) VH G M(Pi+i), A has a strongly 
measure consistent proof in Pi+i. 

Consider any ground atom A G AI{Pi). Since Pi is measure consistent, A has 
a strongly measure consistent proof T in Pi. We now construct a strongly measure 
consistent proof T' of A in Pi+\. Construction of T' proceeds by induction on 
atom measures. Let C be a clause used at the root of T. Let A:— Ai,...,A„ 
(where n > 0) be the ground instantiation of C at the root of T. Since T is 
strongly measure consistent a{Ai) -< a{A), for all 1 < i < n. Hence, we have 
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strongly measure consistent proofs T{, of Ai, in Pi+\. We construct 
T' by considering the following cases: 

Case 1: C is inherited from Pi into Pi+i 

T' is constructed with A\— at its root and as its children. 

This proof T' is strongly measure consistent. 

Case 2: C is unfolded. 

Let Ai be the atom in the body of C which is unfolded. Let the clause used to 
resolve Ai in T be Ci and the ground instance of Ci used be Ai:— Ai^i, ..., 

By definition of unfolding, A:— ..., , A 2 , ■■■,An is a ground instance of 

a clause C[ in P^+i with ^ 7io(C') ©7io(C'i)- Also, a{Aij) -< a{Ai) -< 

a{A), for all 1 < j < h- Thus, we have strongly measure consistent proofs 
h "^ 14 ’ -,^1,11 in Pi+i- The proof T' is now constructed by apply- 
ing A:- Ai^i,...,Aij^,A 2 , ...,A„ at the root, and putting T{ 
as the children. Since T is strongly measure consistent, 

a(A) t 7;o(C) ® Ei<j<n a(Aj) and a{Ai) P lliCi) © a(Aij) 

^ {a{A) © a(Ai)) P 7 *„(C) © 7 *„(Ci) © Ei<,<n «(A,) © Ei<j<i, a(Ai.,) 
Q:(A) P 7;<^^(C'i) © J22<j<n 'a(Aj) © J2i<j<h 

Hence, T' is a strongly measure consistent proof in Pi+\. 

Case 3: C is folded. 

Let C (potentially with other clauses) be folded, using folder clauses from Pj, 
j < i, to clause C" in P^+i. Assume that Ai, ...,Ak are the instances of the folded 
atoms in C. Then, C' has a ground instance of the form A:— B, Ak+i, ..., An 
where B:— Ai,...,Ak is a ground instance of a folder clause D G Pj.^ Since 
M{Pi) = M{Pj) and Ai,...,Afc are provable in Pi they must also be provable 
in Pj. Moreover, since D G Pj, B G M{Pj) = M{Pi). Since Pj is measure 
consistent, a{B) ^ lii{D) © Y.i<i<k «(A;)- 

Now, by the strong measure consistency of T, 

a{A)P^l{C)®Y. l<l<k a{Ai) © X)fe+i<i<n <a(Ai) 

^7L(C)©(a(i?)©7LP)) © '^k+l<l<n (*) 

— {lloiP’) © 'IhiiP)) © © Sfe+l<i<n Oimin{Al) 

>- Oi{B) (by condition of measure preserving folding) 

Now, by induction hypothesis, B has a strongly measure consistent proof T'^ in 
Pi+i. We construct T’ , the proof of A in Pi+i, with A:— B,Ak+i,...,An at its 
root, and ...,Tf as its children. To show that T' is strongly measure 

consistent, note that ^ (7;o(C') 0 7 L(D)) according to the definition of 

measure preserving folding, as C and D are folded and folder clauses. Combining 
this with (*) we get, 

a{A) P 7;+^(C") © a(P) © T,k+i<i<n «(A;) 

This completes the proof. □ 

^ Recall that in the folding transformation, all clauses in Pj whose head is unifiable 
with B are folder clauses. 
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Note that by applying measure preserving unfolding/folding to program Pi, 
we can generate a clause which is also inherited from Pi. It is straightforward 
to adjust the clause measures of P^+i that will still ensure that Pi+i remains 
measure consistent (details are omitted). 

3 Constructing Concrete Unfold/Fold Systems by 
Instantiating the Framework 

To construct a concrete unfold/fold transformation system from our abstract 
framework, the following parameters need to be instantiated : 

1. a measure structure /i; 

2. atom measure a and amin', 

3. clause measure {'jio,Jhi) for clauses in the initial program Pq such that Pq 
is measure consistent; and 

4. functions to compute the clause measure of new clauses obtained by the 
transformations such that they satisfy the constraints imposed by equa- 
tions (1) through (4) (refer Rules 3 and 4). 

Note that there are no further proof obligations. Once the above four elements 
are defined, total correctness of the transformation system is guaranteed by the 
framework. 

3.1 Existing Unfold/Fold Systems 

We first show how our framework can be instantiated to obtain the Kanamori- 
Fujita and the extended Tamaki-Sato systems. To the best of our knowledge, 
these are the only two existing systems that allow folding using recursive clauses. 
However in both of these systems folding is conjunctive. 

The Kanamori-Fujita System [8]: This system can be obtained as an in- 
stance of our framework as follows: 

1. ^ = (Z,-|-,<,N). This measure structure corresponds to the use of integer 
counters in [8]. 

2. a{A) = number of nodes in the smallest proof of A in Pq, and for any atom 
A, amin{-A) = 1. Thus, a{A) denotes the rank of A described in [8]. 

3. VC € Pq 7 jo(C') = 7 °j(C) = 1. Since all clause measures are 1, it follows 
immediately from the definition of atom measures that the smallest proofs 
of any ground goal G are strongly measure consistent, and all proofs in Pq 
are weakly measure consistent. Hence Pq is measure consistent. 

4. VC € Pi+i — Pi (i.e., new clauses in Pi+i), "fio^iC) = GLB'’~^^{C) and 
7 ^^^(C') = LUB^^^{C). Under the given measure structure, it is immediate 
that the above definition is identical to the computation on counters in [8]. 

Furthermore, the measure preserving folding rule (Rule 4) is applied only when 
both folder and folded clauses are singleton sets. It is easy to see a one-to-one 
correspondence between the conditions on unfold/fold transformations of the 
above instantiation and the Kanamori-Fujita system. 
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The Extended Tamaki-Sato System [21]: In this system all the predicate 
symbols are partitioned into n strata. In the initial program a predicate from 
stratum j is defined using only predicates from strata < j. We can obtain this 
system as an instance of our framework as follows: 

1. where © denotes coordinate- wise integer addition of n- 
tuples of integers, and © denotes the lexicographic < order over n-tuples 
of integers. The n-tuples in the measure structure will correspond to the n 
strata of the original program. 

2. a{A) = min({rc(r) | T is a proof of A in Po})j where w{T) is the weight of 
the proof T defined as an n-tuple {wi , . . . , rc„) such that VI < j < n, Wj is 
the number of nodes of predicates from stratum j in T. a{A) corresponds to 
the notion of weight-tuple measure of A defined in [21]. 

For any atom A, amin{A) = 0 = (0, . . . ,0). 

3. VC G Po, 7 °o(C') = lhi{C) = (wi, ■ . ■ ,Wn), where C = A:- Ai,... , and 
for 1 < j < n, Wj = 1 if the predicate symbol of A is from stratum j, and 0 
otherwise. 

For any A G M{Pq), the proof T that defines a{A) (item 2 above) is strongly 
measure consistent. Weak measure consistency of ground proofs in Pq is 
established by induction on their size. 

4. VC G P.+i-P*, = LUB^+\C) and = approx {GLB^+\C)). 

The function approx reduces a measure as follows. Let u = {ui,... ,Un) 
and fcmm be the smallest index k such that Uk > 0. Then approx{u) = 
{u'l , . . . , where = 1 and is 0 elsewhere. 

As in the Kanamori-Fujita system, here also the measure preserving folding 
rule is applied only when both folder and folded clauses are singleton sets. 

To establish the correspondence between the above instantiation and the 
extended Tamaki-Sato system, recall that the latter associates a descent level 
with each clause of every program in a transformation sequence. If a clause C in 
Pi has the descent level k, then with the above instantiation, = {h, . . . ,ln) 

where Ik = ^ and 0 elsewhere; i.e. the only non-zero entry in its lower clause 
measure appears in the position. Thus our lower clause measure precisely 
captures the information that is kept track of by the extended Tamaki-Sato 
system. 

Assigning Measure Structures and Clause Measures Observe that our frame- 
work does not prescribe exact values to the clause measures. Instead it bounds 
the clause measures from above and below. So an important aspect of our in- 
stantiation involves assigning values to the clause measures that satisfy these 
constraints. From an abstract point of view, the Kanamori-Fujita system uses 
a relatively coarse measure space (Z) but within this space it maintains accu- 
rate clause measures (integer counters). Our instantiation reflects this by not 
relaxing the bounds while updating the clause measures (see step 4 of the in- 
stantiation). On the other hand, the extended Tamaki-Sato system uses a more 
fine-grained measure space (IP). But this measure space is not completely uti- 
lized since clause measures are the descent level of clauses, which can be simply 
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represented by an integer. Therefore in step 4 of our instantiation we accord- 
ingly loosened the bound. As far as the Gergatsoulis-Katzouraki [7] and original 
Tamaki-Sato systems [20] are concerned, first note that they do not permit fold- 
ing using recursive clauses. These systems use coarse measure spaces. Moreover 
they do not even fully utilize these measure spaces as is evident from the lesser 
amount of book keeping performed by them. By choosing a coarse measure struc- 
ture and relaxing the bounds along lines similar to the extended Tamaki-Sato 
system we have been able to instantiate these two systems as well. Details are 
omitted. 

3.2 SCOUT A New Unfold/Fold System 

We now construct SCOUT, an unfold/fold transformation system for definite 
logic programs that allows disjunctive folding using recursive clauses. It incor- 
porates the notion of strata from the extended Tamaki-Sato system into the 
counters of the Kanamori-Fujita system. Thus with every clause it maintains a 
pair of stratified counters as the clause measure. The instantiation is as follows. 
We assume that the predicate symbols appearing in the initial program Pq are 
partitioned into n strata, as in the extended Tamaki-Sato system. 

1. fjL = (Z",0,^,N") where © denotes coordinate- wise integer addition of n- 
tuples of integers, and A denotes the lexicographic < order over n-tuples of 
integers. 

2. a{A) is defined exactly as in the instantiation of the extended Tamaki-Sato 
system above. For any atom A we set amin{A) = {w\, . . . , Wn) where wj = 1 
if A is from stratum j and 0 elsewhere. 

3. Clause measure of clauses in Pq is defined exactly as in the instantiation of 
the extended Tamaki-Sato system above. Therefore the proofs of measure 
consistency are also identical. 

4. VC € P,+i - 7 ;+^(C) = GLB^+\C) and 7 ^+'(C) = LUB^+\C). 

SCOUT provides a solution to two important (and orthogonal) problems 
that have thus far remained open: folding using clauses that have disjunctions 
as well as recursion, and combining the stratification-based (extended) Tamaki- 
Sato system with the counter-based Kanamori-Fujita system thereby obtaining 
a single system that strictly subsumes either of them even when restricted to 
conjunctive folding (See [13] for a formal proof of this claim). 

It is interesting to note that by simple inspection of the instantiations, one can 
see that when the number of strata is 1 and only conjunctive folding is permitted, 
SCOUT collapses to the Kanamori-Fujita system. Collapsing SCOUT to other 
existing unfold/fold systems by varying the number of strata and extending the 
parameters (e.g. measure structure) remains an interesting open problem. 

4 Goal Replacement 

Augmenting an unfold/fold transformation system with the goal replacement 
rule makes it more powerful. In this section we incorporate goal replacement to 



A Parameterized Unfold/Fold Transformation Framework 409 



our parameterized framework. Goal replacement allows semantically equivalent 
conjunctions of atoms to be freely interchanged. We formally define it below. 
For a conjunction of atoms Ai,...,An, we use the notation vars{Ai, An) to 
denote the set of variables in Ai, An- 

Rule 5 (Goal Replacement) Let C be a clause A:— Ai, . . . , Ak, G in Pi, and 
G' be an atom such that vars{G) = vars{G') C vars{A, Ai, Ak). Suppose 
for all ground instantiation 0 of G, G' we have Pt h G0 Pi \~ G'6. Then 
P,+i := (P, - {C}) U {G'} where G' = A:- Ai,... ,Ak,G\ □ 

Note that although we replace a single atom G by another atom G' (where G and 
G' do not contain any internal variables), we can replace conjunctions of atoms 
using a sequence of folding, goal replacement and unfolding transformations. 

The above transformation is partially correct (a formal proof appears in [13]). 
However, if goal replacement is applied to a measure consistent program Pi it 
is totally correct. But then we also need to ensure that the resulting program 
Pi+i is measure consistent. If this is ensured, then even if goal replacement is 
interleaved with irreversible folding total correctness will be preserved. Formally, 



Rule 6 (Measure Preserving Goal Replacement) Suppose program Pi+i 
is obtained from program Pi by applying the goal replacement transformation 
as described in Rule 5. Let there exist S,5' € Ai (where measure structure is 
fi = {A4,(B, W)) such that for all ground instantiation 9 of G, G', we have: 

(i) 5 < a{G9) 0 a{G'9) ^ 5' (ii) Yio(^) ® ^ ® J2i<p<k o:min(Ap) >- 0. Then 

li^\G')<jUG)®5 (5) 

'ylViC')hlL{G)(BS' ( 6 ) 

The clause measures of the other clauses of Pi+i are inherited from Pi. □ 

We now present a formal proof of total correctness and preservation of measure 
consistency of the above rule. 

Theorem 3 Let Pi+i be derived from Pi by applying measure preserving goal 
replaeement as deseribed in rule 6. If Pi is measure consistent, then Al^Pf) = 
M(Pi_|_i) and Pi+\ is also measure consistent. 

Proof. Since measure preserving goal replacement is a special case of the goal 
replacement transformation in rule 5, we have AI{Pi+i) C AI{Pi) by partial 
correctness of rule 5. Therefore it is sufficient to prove that : (1) all ground proofs 
of Pi+i are weakly measure consistent (2) AI{Pi) C M(Pi+i) (3) VR G M(Pi+i) 
there exists a strongly measure consistent proof of B in P^+i. We prove proof 
obligation (1) separately. Proof obligations (2) and (3) are proved by showing 
that : VP G M{Pi) there exists a strongly measure consistent proof of B in P^+i. 
This is sufficient since we know M(Pi+i) C AI{Pi). 

First, we prove that all ground proofs of P^+i are weakly measure consistent. 
The proof proceeds by induction on the size of ground proofs in P^+i. Let T be 
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a ground proof of a ground atom B in Pi+i. If the clause used at the root of T 
is not the new clause C", then the proof follows by induction hypothesis and the 
measure consistency of Pi. If the clause used at the root of T is C' , then let the 
ground instance of C used at the root of T be A6\— Ai9, . . . , AkO, G'9. By in- 
duction hypothesis, the proofs of A\9 , . . . , Ak9, G'9 in T are weakly measure con- 
sistent. It suffices to show that a{A) < 7^^^(C")0X]i<i</c a{Ai9)®a{G'9) Now, 
G'9 G G'9 G M{Pi). Hence by rule 5 we have G9 G M{Pi). Also, 

VI < / < fc G AI{Pi) (as M(P,+i C Al\p,)). Then, A9:~ Ai9, . . . Ak9, G9 is 
a ground instantiation of C which appears at the root of some ground proof in 
Pi- Since Pi is measure consistent we have 

a{A) A 7^,(C) © Y.i<i<k ot{Ai9) © a{G9) 

^ lUC) © a{Ai9) © ( a{G'9) © 5' ) 

^ llViC) © o.{Ai9) © a{G'9) 

Now, we prove that VP G M{Pi) there is a strongly measure consistent proof 
of B in Pi+\. Since Pi is measure consistent, it suffices to translate a strongly 
measure consistent proof T of P in Pi to a strongly measure consistent proof T' 
of P in Pi+i for all P G M{Pi). We do this translation by induction on the atom 
measures. If the clause used at the root of T is not G (where G is the clause in 
Pi that is replaced) then the proof follows from the definition of strong measure 
consistency and induction hypothesis. Let G be the clause used at the root of T 
(a strongly measure consistent proof of A in Pi) and let A9\— A\9 ^ . . . , Ak9, G9 
be the ground instance of C used. Then, by strong measure consistency of T, 
a{Ai9) -< a{A9) for all 1 < Z < fc. By induction hypothesis, we then have strongly 
measure consistent ground proofs T^, . . . , of Ai9 , . . . , Ak9 in P^+i. Also, by 
strong measure consistency of T 

a{A) P ^l(C) © Ei<i<fc a{Ai9) © a{G9) 

h ll{C) 0 <^{Ai9) © ( a{G'9) ®6) (*) 

— ( 7io(^) 0 Si<;<fc 0'min{Ai9) 0^)0 a{G' 9) 

>- a{G'9) (By condition (ii) of rule 6) 

Then, by induction hypothesis, G'9 has a proof in P^+i. The ground proof 
T' is constructed with A9-.— Ai9,... ,Ak9,G'9 at the root (this is a ground 
instance of G' , the new clause in Pi+i) and T{, . . . ,T^,Tg,g as its children. To 
show that this proof T' is measure consistent, note that 7 (^^(C") 0 7zo(C') 0 S. 
Combining this with (*), we get 

a{A) P ^;+\C') © ^ a{Ai9) © a{G'9) 

l<l<k 



This completes the proof. □ 

Observe that, similar to the goal replacement transformation in [8,20,21] the 
conditions under which rule 6 may be applied are not testable at transformation 
time. For testability we need to (1) determine whether G and G' are semantically 
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equivalent, and (2) estimate S and S' such that the clause measures of Pi+i can 
be computed. 

Semantic equivalence is undecidable in general and can be conservatively 
approximated using program analysis. To estimate S and 6' observe that any 5' 
which dominates the atom measure of all ground atoms satisfies the conditions of 
Rule 6. However, such a S' may not always exist in the given measure structure. 
In such cases, we can extend the measure structure ^ = (Af,©,^,Vy) to (Z x 
Ad, ©', di', N X yy), where \lz\,Z 2 € Z and Vmi, m 2 € M (zi, mi) ©' (z 2 , ^ 2 ) = 
(zi + Z 2 , mi ©m 2 ), and d' is the lexicographic ordering of pairs from ZxA4. Atom 
measures in this extended measure space are of the form (0, w) (where w € W). 
We set S' = (1,0), which is lexicographically greater than all atom measures. 
Also, in certain cases we can define a lower bound of S as follows. Let P be the 
atom in the body of a clause in Pi that is replaced and let {Ci, . . . , C„} be the 
clauses in Pi that unify with B. Then, <5 ^ niiini<k<n(7io(C'k) — 0 !min(hd(Ck))), 
where hd{Ck) is the head atom of Ct (for details see [14]). 

The above steps define a procedure to add goal replacement to any arbitrary 
unfold/fold system instantiated in our framework. More importantly, this is done 
by simply manipulating the measures; the proofs of correctness of the augmented 
transformation system follow immediately from the proofs of our framework. 



5 Conclusion 



The development of a parameterized framework for unfold/fold transformations 
has several important implications. It enables us to compare existing transfor- 
mation systems and modify them without redoing the correctness proofs (e.g., 
extending measures for goal replacement in Section 4). It also facilitates the 
development of new transformations systems. For instance, we derived SCOUT 
which permits folding using multiple recursive clauses. Such a transformation 
system is particularly important for verifying parameterized concurrent systems 
(such as a n-process token ring for arbitrary n) using logic program evaluation 
and deduction [4,16]. 

In [15], we have extended the work reported in this paper to obtain general- 
ized unfold/fold transformation systems for normal logic programs. Aravindan 
and Dung [1] developed an approach to parameterize the correctness proofs of 
the original Tamaki-Sato system with respect to various semantics based on the 
notion of semantic kernels. Incorporating the idea of semantic kernel into our 
framework yields a framework that is parameterized with respect to the measure 
structures as well as semantics. 

In future, it would be interesting to study whether we can develop similar 
parameterized unfold/fold transformation frameworks for other programming 
paradigms such as functional and concurrent constraint programming languages 
[5,17] as well as process algebraic specification languages {e.g. CCS) [6]. 
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Abstract. We study the problem of an efficient and precise sharing 
analysis of (constraint) logic programs. After recognizing that neither 
plain Sharing nor its non-redundant (but equivalent) abstraction scale 
well to real programs, we consider the domain proposed by C. Fecht 
[12,13]. This domain consists of a combination of Pos with a quite weak 
abstraction of Sharing. While verifying that this domain is truly remark- 
able, in terms of both precision and efficiency, we have revealed significant 
precision losses for several real programs. This loss concerns groundness, 
pair-sharing, linearity, but not freeness. (Indeed, we have proved that a 
wide family of abstractions of Sharing do not incur precision loss on free- 
ness.) We define a simple domain for sharing analysis that supports the 
implementation of several widening techniques. In particular, with this 
domain it is straightforward to turn Fecht’s idea into a proper widening. 
More precise widenings are also considered. However, in spite of thor- 
ough experimentation we found that the first widening we propose is 
hard to improve on, provided Pos is included in the domain. We show 
that when Pos is not included, a widening based on cliques of sharing 
pairs is preferred. 

Keywords: Mode Analysis, Sharing Analysis, Widening. 



1 Introduction 

For (constraint) logic programs, the main purpose of sharing analysis is to detect 
pair-sharing; that is, which pairs of variables are definitely independent. In a pre- 
vious work [3] we observed that the Sharing domain of Jacobs and Langen [15] is 
redundant for pair-sharing. This achievement has important theoretical conse- 
quences (some of which will be exploited in the present work) and also a practical 
interest. In fact, it allows to keep sharing-sets as small as possible without any 
precision loss and to replace the star-union operation, whose complexity is expo- 
nential, by self-bin-union, which is quadratic. Even though significant speed-ups 
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have been observed in practice (up to three orders of magnitude on the analy- 
sis of real programs), the problem of scalability of the analysis, both in terms 
of precision (that is, the number of pairs that are detected as being definitely 
independent) and of resource usage, was still to be solved. 

In this paper we address this problem. However, in order to give the right 
focus to the present work, we need to explain in detail what we aim at. 

1.1 Analyses and Analyzers 

The experimental part of our work is devoted to the construction of practical, 
precise and efficient data-flow analyzers for constraint logic-based languages. 
Some issues connected with the emphasized words deserve clarification. 

A “practical analyzer” is one that has a chance to be turned into a useful tool. 
On one hand this means that compromising assumptions about the languages 
and the programs to be analyzed must be avoided as far as possible. Researchers 
in our area (including the present authors) have often made assumptions that 
are falsified by the implemented languages and their programs. This state of 
affairs can be justified in a relatively immature field, but this is no longer the 
case for the data-flow analysis of logic programs. Therefore we believe that we 
should now rid ourselves of most, if not all, limiting assumptions. We must take 
into account, for instance, that implemented languages perform unification that 
omits the occur-check; that programmers do exploit “nasty constructs” such as 
assert/1 and call/1; that real programs make use of all kinds of built-ins 
provided by the language; as well as libraries, foreign language interfaces etc. 

Many applications of data-flow analysis, such as semantics-based program- 
ming environments, need very precise information about a program’s behavior in 
order, say, to assist the programmer during development, debugging, and certi- 
fication. In the literature there are several papers reporting on the experimental 
evaluation of data-flow analyzers. In some of them one can find analysis’ times 
well under the second for non-trivial, lengthy programs. What can one conclude 
from the fact that a program of several thousands lines can be analyzed in a cou- 
ple of seconds on a desktop computer? If one excludes the possibility outlined 
above that special assumptions have been exploited so that the results cannot 
be generalized, the answer is probably that more precision is attainable. One of 
the important applications of data-flow analysis is in computer-assisted program 
verification or certification. In this field, what is not done by the computer must 
be done by hand. Who will spend hours to complete proofs by hand when the 
computer can do them in the same or even double the time? Similar remarks 
hold also for optimized compilation, if one takes into account that (1) only pro- 
duction versions deserve to be compiled with the optimization passes turned on, 
(2) a production version is compiled once and used thousands, perhaps millions 
of times, and (3) computers do work overnight. 

So we do not participate in the race for the fastest ever analysis, especially 
when done (as is often the case) at the expense of precision. The real problem 
is how to increase precision yet avoid the concrete effects of exponential com- 
plexity. Consider groundness analysis, for instance. The cruder domains do not 
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pose any efficiency problem. In contrast, the more refined domains for ground- 
ness, such as Pos, work perfectly until you bump into a “nasty” program clause 
(i.e., with more than, say, fifty variables for which the analyzer knows too little 
at that point of the analysis). When this happens, Pos will exhaust your com- 
puter’s memory. One would like to have a more linear, or stable behavior. The 
right solution, as indicated by Cousot and Cousot [11], is not to revert to the 
simpler domains. We should use instead complex domains together with widen- 
ing/narrowing operators. With such techniques we can try to limit precision 
losses to those cases where the cost of the complexity implied by these refined 
domains exceeds the available resources. 

Ideally, it should be possible to endow data-flow analyzers with a knob. The 
user could then “rotate the knob” in order to control the complexity /precision 
ratio of the system. The widening/narrowing approach can make this possibility 
a reality. Unfortunately, the design of widening operators tends somewhat to 
escape the realm of theoretical analysis, and thus, in the authors’ opinion, it 
has not been studied enough. Indeed, the development of successful widening 
operators requires, perhaps more than other things, extensive experimentation. 



1.2 Fecht’s Work 



C. Fecht [12,13] proposed a domain J, SH for sharing analysis based on an ab- 
straction of the usual Jacobs and Langen domain SH [15]. This domain is the 
same as SH but the concretization of a set of variables in | SH is equivalent to 
the concretization of its powerset in SH. The advantage of J, SH is not just that 
an element can be normalized by removing all but the maximal sets, thereby 
reducing its size, but because it enables more efficient (but less precise) abstract 
operations than those used for SH and its non-redundant version SH^ [3] . More- 
over, for computing the abstract unification in | SH, Fecht describes two useful 
optimizations that improve efficiency without losing any further precision. 

One of the problems with the domain | SH is that it does not capture ground 
dependencies. These are important for tracking sharing dependencies and, hence, 
sharing. Fecht solved this by deriving the ground dependencies through the Pos 
component of the combined domain Pos + J, SH and also Pos -I- J, SH + Lin. 
Fecht tested both these domains and showed that, with his benchmarks, they 
compared favorably with equivalent ones using SH for the sharing and ground 
dependencies. He reported a negligible loss of precision and demonstrated that 
large programs could be analyzed using both Pos J- | SH and Pos J- | SH + Lin 
in a reasonable time scale. The results, although inconclusive, demonstrated real 
promise for an analyzer based on the J, SH approach. We say the results were 
inconclusive. The reason for this is that only a few non-trivial programs were 
tested and, for most of these, precision was not compared. (Fecht’s SH analyzer 
could not cope with large programs possibly due to the problem that there was 
no redundancy elimination.) We note that Fecht did not present the domain | SH 
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as a widening^ and did not discuss how a widening based on his domain might 
be achieved. 

1.3 Combining Domains 

In Fecht’s work, and also in the work presented here, the combination of a sharing 
domain with Pos is the simplest possible. For any operation of the analysis, ab- 
stract mgu in particular, the Pos component is evaluated first. All sharing groups 
containing at least one variable that is definitely ground according to the result- 
ing Pos formula are removed from the sharing component. This combination is 
made particularly efficient by the ready availability of definite groundness infor- 
mation allowed by the GER representation introduced in [-5] , where obtaining the 
set of definitely ground variables (and also the classes of groundness-equivalent 
variables) is a constant-time operation. Note that, theoretically speaking, more 
sophisticated combinations of Pos with Sharing are possible [8]. 

Following several other authors, we observed in [3], that, from a practical 
point of view, sharing analysis without freeness or linearity does not make sense. 
Both these properties allow, in a significant proportion of cases, to dispense with 
costly operations (such as star-union or, better, self-bin-union [4]) increasing 
the precision of sharing information at the same time, and this with very little 
overhead. Moreover, freeness is a useful property in itself. For details on how the 
combination with freeness is realized, we refer the reader to [17,19]. See [7] for 
the combination of both freeness and linearity information. 



1.4 Experimental Results 

We have compared the domain of Fecht enhanced with freeness information, that 
is Pos-hi SP[ +Free+Lin, with the same domain where J, SH is substituted by the 
non-redundant sharing domain SPl^ [3] . The precision of the analysis is measured 
by summing results over the success-patterns, for goal-independent (GI) analysis, 
and in both the call- and success-patterns, for goal-dependent (GD) analysis, for 
each procedure. For the domains tested, that is, Pos-kj SP[ + Free+ Lin, abbrevi- 
ated as P-kDSH-l-F-l-L, and Pos+SLL^+Free+Lin, abbreviated as P-I-NSH-I-F-I-L, 
the precision results consist of: the total number of definitely non-sharing pairs 
of program variables, NSP, the total number of definitely ground variables, GV, 
and the total number of definitely linear variables that are possibly not ground, 
LV. The freeness results are not compared because, as we have shown in [19], 
freeness is not affected, neither by abstracting SH to | SH, nor by redundancy 
elimination. 

The comparison involved all the 92 Prolog programs in our current test-suite. 
On 73 of them there was no difference in precision. This is really remarkable 
considering that the | SH approximation is rather crude. 

^ Indeed the approach of Fecht falls under the category “use a simpler domain” which, 
as clearly explained in [11], is both contrary and inferior to the approach “use a 
complex domain with widening” that is advocated in this paper. 
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141 


58 


caslog 


6456 


★466 


1588 


7027 


★474 


1615 


11073 


★1625 


1079 


7 


7 


7 


cg.parser 


136 


31 


159 


138 


31 


160 














dpos_an 


92 


40 


76 


95 


40 


76 


183 


76 


53 


183 


76 


53 


Ig-sys 


7274 


645 


2260 


7334 


645 


2261 














nand 


473 


23 


182 


475 


23 


182 


1341 


481 


70 


1341 


481 


70 


nbody 


261 


52 


104 


262 


52 


104 


477 


151 


41 


478 


151 


41 


oldchina 


2185 


285 


1163 


2193 


285 


1166 


3985 


802 


760 


7 


7 


7 


quot_an 


288 


37 


160 


288 


37 


160 


639 


159 


122 


646 


159 


122 


reg 


774 


42 


272 


796 


42 


284 


207 


67 


52 


207 


67 


52 


rubik 


70 


★55 


110 


73 


★76 


93 


174 


★110 


124 


201 


★200 


103 


see 


63 


0 


37 


63 


0 


37 


503 


174 


46 


506 


174 


46 


sf echt 


28 


0 


14 


85 


0 


31 


221 


0 


47 


278 


0 


64 


simple_an 


370 


27 


139 


373 


27 


139 


572 


82 


76 


639 


82 


76 


slice 


427 


126 


453 


428 


126 


453 














spsys 


788 


81 


386 


800 


81 


394 














trs 


32 


6 


22 


53 


6 


22 


73 


★12 


20 


104 


★12 


20 



Table 1. Pos + J, SH + Free + Lin vs Pos + SH^ + Free + Lin: precision. 



The combined domain Pos + | SH + Lin is isomorphic to ASub + Pos (where 
ASub is the pair-sharing domain of Spndergaard [18]), and the domain Pos-bi SH 
is exactly the domain ASub+ defined by Cortesi and File in [9]. However, they 
considered this domain only en passant and only from a theoretical point of view. 
In other words, Fecht has the whole merit for having trusted on this domain from 
a precision/efficiency perspective. 

The results for the remaining 19 programs are summarized in Table 1. The 
blank entries in the goal-dependent columns are for those program whose goal- 
dependent analysis is pointless. This usually happens because the program con- 
tains a procedure call to an unknown procedure (e.g., by means of call/1). The 
China analyzer (i.e., our system [1]) promptly recognizes these cases and reverts 
to a goal-independent analysis. This is one of the reasons why focusing only on 
goal-dependent analyses is, in our opinion, a mistake. The other reason being 
that the ability of analyzing libraries once and for all is desirable and, more 
generally, so is the separate analysis of different program modules, especially in 
very large projects. Focusing only on goal-independent analyses is the opposite 
mistake: GD analyses, when possible, are more precise than GI ones. For these 
reasons, we insist in presenting experimental results for both. 

A star symbol (*) in the GV column signifies that one of the widenings we 
employ on the GER representation of Pos fired. This is a widening imposing 
a limit on the number of ROBDD nodes simultaneously allocated. It makes 
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approximations of the R (ROBDD) component when this limit is reached^, while 
retaining full precision on the G (definitely ground variables) and the E (classes 
of equivalent variables) components [2,5]. The scarcity of stars in this and the 
following tables, shows how seldom this widening is actually required.^ 

Apart from sf echt, which is a synthetic benchmark designed in order to show 
that arbitrary precision losses are possible with | SH, Table 1 illustrates how 
heavy precision penalties can be incurred by J, SH even on real programs. Most 
notably, for bryant we see a precision loss as high as 28% on goal-independent 
analysis (GI) and 42% on goal-dependent analysis. In addition, simple_an loses 
10% (GD), while trs loses 40% (GI) and 30% (GD). Note that, for these pro- 
grams, the Pos widening fires only on the GD analysis of trs. The rubik program 
shows an interesting phenomenon: here the Pos widening fires incurring a preci- 
sion loss of exactly 1 ground variable (a critical one indeed), but SH^ saves the 
day by recovering the lost groundness information. A similar thing happens for 
caslog. Thus, the widely held opinion (now proved in [8]) that Sharing does not 
help Pos on groundness does not carry through when widenings are considered. 

While space limitations do not allow to report full timing information, we 
can easily confirm Fecht’s claim: the speedup is dramatic. Just a few examples: 
the fixpoint computation time in seconds for bmtp, caslog, lg_sys, and spsys 
drops from 15.6, 614.7, 735.9, and 2.2, to 0.8, 2.0, 3.3, and 0.6, respectively. All 
the experiments described in this paper were performed on a PG equipped with 
an AMD K6@400MHz, 128MB of main memory, and running Linux 2.2.1. 



1.5 The Present Work 

The objective of this work, after having recognized that Fecht’s approach incurs 
significant precision loss on several real programs, is to improve the state of the 
art in mode analysis, in general, and sharing analysis in particular. 

We moved from the observation that, when the sharing-sets become large, 
then they are at the same time heavy to manipulate and, at least for a subset 
of the variables involved, light as far as information content is concerned. We 
thus introduce a new representation for set-sharing made of two components. 
They are both sharing-sets. However, while the second one is interpreted in the 
usual way, the first component records worst-case sharing assumptions of sets of 
variables. 

We define the operations required for the analysis with this representation, 
and we prove them correct. We also introduce two safe optimizations that turn 
out to be very effective in practice. 

We then show how the proposed representation supports a variety of widen- 
ings. One of those is a simple adaptation of Fecht’s idea. Others are much more 
sophisticated and involve only a limited precision loss. However, in spite of thor- 
ough experimentation (of which only a tiny fraction can be reported here) we 
found that the first widening we propose is hard to improve on, provided Pos 

^ That is, by approximating x Ay with x or with y, x\/ y with true and so forth. 

^ Indeed, the newest version of China avoids also the widening for the caslog program. 



420 



Enea Zaffanella et al. 



is included in the domain. This suggests that what is lost by this widening is 
mostly constituted by ground dependencies, and these can be recovered (and 
improved) by the Pos component. We show that when Pos is not included, a 
widening based on cliques of sharing pairs is preferred. Since some authors advo- 
cate the use of Sharing without coupling it with Pos (we do not share this view), 
this is an important message for them. 

Among the contributions of this paper we would like to stress the follow- 
ing: we present a data-flow analysis for groundness, freeness, pair-sharing, and 
linearity, with unprecedented levels of precision and efficiency. With the imple- 
mentation described in this paper, the China analyzer is able to honor one of its 
most important design goals: never crash (e.g., by exhausting all the available 
memory), always terminate with a correct result and in reasonable time. 

The paper is structured as follows: In Section 2 we briefly recall the re- 
quired notions and notations, even though we assume general acquaintance with 
the topics of abstract interpretation, sharing analysis and groundness analysis. 
Section 3 introduces a simple domain for sharing analysis that supports the im- 
plementation of several widening techniques. In particular, with this domain it 
is straightforward to turn Fecht’s idea into a proper widening. This is done in 
Section 4, after the introduction of an infinite family of widenings and the proof 
of their safety. More precise widenings are also considered. The experimental 
evaluation of the proposed approach is presented in Section 5. Section 6 con- 
cludes with some final remark. The reader is referred to [19] for full proofs of all 
the results presented in this paper, and for more material on this subject. 

2 Preliminaries 

For any set S, p{S) denotes the power set of S and # S' is the cardinality of S. 
A monotone and idempotent self-map p: P ^ P over a poset {P,di) is called 
a closure operator (or upper closure operator) if it is also extensive^ namely 
\/x € P : X ^ p{x). In this paper, we assume there is a fixed and finite set 
of variables of interest denoted by F/. If t is a first-order term over VI , then 
vars{t) denotes the set of variables in t. Bind denotes the set of equations of 
the form x = t where x G VI and t is a first-order term over VI. Note that we 
do not impose the occur-check condition x ^ vars(t), since we have proved in 
[14] that this is not required to ensure correctness of the operations of Sharing 
and its derivatives. The following definition is a simplification of the standard 
definition for the Sharing domain [10,14,15] where the set of variables of interest 
is fixed and finite. 

Definition 1. (The set-sharing domain SH.) The set SH is defined as 
SH p{SG), where SG { S e p{VI) \ S ^ 0 } . 

We now introduce the required abstract operations over SH . 

Definition 2. (Some abstract operations over SH.) The binary function 
proj : SH x p{VI) — > SH projects an element of SH onto a subset of VI: if 

rltaf 

sh G SH and V G p{ VI), then profish, V) = {S'nFjS'S sh, S' n F 0}- 
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For each sh S SH and each V € p{VI), the extraction of the relevant com- 
ponent of sh with respect to V is encoded by the function rel: p( VI) x SH — > SH 

defined as rel(y, sh) =^{S'€s/i|5'ny7^0}. 

For sh G SH and V G p{VI), the exclusion of the irrelevant component of 
sh with respect to V is encoded by the function rel: p{VI) x SH SH defined 
as rel(F, sh) sh \ rel(F, sh). 

The star-union function (•)* : SH — > SH, is given, for each sh G SH, by 

sh*=^ {SGSG\3n>1.3Ti,... ,T„Gsh.S = TiU---UT„}. 

For each sh\,sh 2 G SH , the binary union function bin: SH x SH SH is 

def 

given by bin(s/ii, s/ 12 ) = { S'! U 5'2 | S'! S shi, S 2 G s /12 }■ 

We also use the self-bin-union function sbin: SH — > SH which is given by 
sbin(s/i) bin(s/i, sh) 

The function amgu captures the effects of a binding on an SH element. Let 
{x = t) G Bind, sh G SH , Vx = {x}, V* = varsft), and Vxt = 14 U V*. Then 

amgu(s/i, X = t)'^ rel(14t, sh) U bin(rel(14, sh)*, rel(Vt, sh)*). 

The domain SH captures set-sharing. However, the property we wish to de- 
tect is pair-sharing and, for this, it has been shown that SH includes unwanted 
redundancy [3]. 

Definition 3. (Redundancy.) Let sh G SH and S G SG. S is redundant for 
sh if and only if ff S >2 and pairs(S') = U{ pairs(T) | T G sh, T C S) where 

paiTs{S) = {PGp{S)\ffP = 2}. 

Definition 4. (The domain SH^.) The function p\ SH — s- SH is given, for 

each sh G SH, by p{sh) shU {S G SG \ S is redundant for sh}. Then 
SHP = p{SH) = { p{sh) \shGSH}. 

We use the notation s/ii =p s /12 and s/ii Cp s /12 to denote p{shi) = p(s/i 2 ) and 
p{sh\) C p(s/i 2 ), respectively. The advantage of SH^ is that we can replace the 
star-union operation in the definition of the amgu by self-bin-union without loss 
of precision [3]. In particular, it is shown that 

amgu(s/i,a: = f) =p rel(T4t, s/i) U bin^sbin(rel(T4, s/i)) , sbin(rel(Vt, s/i))^ (1) 

3 A New Representation for Set-Sharing 

We introduce here a new representation for set-sharing. It is made up of two 
components: one is the original set-sharing domain while the other represents all 
possible subsets of each of its elements and, for this reason, is called a clique-set. 

Definition 5. (Clique-set.) A clique-set is an element of GL and GL SH . 

An element of a clique-set is called a clique. 
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Definition 6. (Sharing-sets representation for clique-sets.) The (over- 
loaded) functions J, : SG — *■ SH and CL SH are given, for each C G SG 

and each cl € CL, hy [C p(C) \ {0} and [ cl UcGci i ^ ■ Observe that | 
is an upper closure operator over SH . Lf cl G CL and C G SG then we say that 
C is down-redundant in cl if there exists C G cl such that C GL C . 

The addition or removal of down-redundant elements to or from a clique-set 
makes no difference to the sharing-sets that it represents. So, a clique represents 
a worst case‘s pair-sharing condition on the set of variables it contains. 

In an implementation, as we need to keep the clique-sets as small as possible, 
down-redundant cliques are removed via a normalization function. 

Definition 7. (Normalization of clique-sets.) The normalization function 
I • I : CL — > CL is given, for each cl G CL, by 

\cl\'= cl\{C G cl \ C is down-redundant for cl }. 

We now define abstract unification over clique-sets and state its soundness. 

Definitions. (Abstract unification over cliques.) For each V G p(V7) 
and each cl G CL, the function reN : p( VL) x CL — *■ CL is given by 

c/) ='' { c \ y I C e cZ } \ {0}. 

The function arngu'-’-: CL x Bind CL is given, for each cl G CL and each 
{x = t) G Bind, by 



amgu‘''‘(cZ, a; = t) rel‘'''(T4t, cl) U bin(sbin(cZx), sbin(cZt)) , 

where clx = rel(14,cZ), clt = rel(Vt,cZ), 14 = {x}, V) = varsft), and, finally, 
Vxt = VxUVt. 



Theorem 1. For each cl G CL and each {x = t) G Bind, 

amgu(i cl,x = t) Cp |amgu'^‘-(c/,a; = t). 

Because cliques represent their downward closure, arngu'-'- introduces down- 
redundant cliques when both the relevant components are non-empty. As already 
observed (without proof) and implemented by Fecht, there are two optimizations 
for computing the amgu'”'' that enable useful efficiency improvements. These are 
reformulated in Section 3.1 for the domains defined here. 

We next define our new sharing domain for widening. 

While this terminology is due to Langen [16], our definition differs from the one he 
used. 
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Definition 9 . (The representation.) The set SH'^ is given by 
SjjSN drf I I ^ shGSH} 

and is ordered by C defined as follows, for each shw, (cZi, shi), (c?2, s/12) G 
{ell, shi) C {cl2, sh2) {cli C CI2) A {sh\ C s/12). 

It can be seen that SH'^ is a complete lattice. 

The sharing-set represented by an element of SH''^ is given by the function 
I(-): SH'" SH defined, for each {cl,sh) G SH'^ , by l{{cl,sh)) ='' id U sh. 
The normalization of an element of SH'^ is given by |-| : SH'^ SH'^ defined, 
for each {cl, sh) G SH'^ , by \{cl, s/i)| '=^ (|c/|, sh \ i cl). 

The normalization removes unnecessary elements from a description in SH'^ . We 
now define an upper closure operator g inducing an equivalence relation on the 
elements of SH'^. 

Definition 10 . (The g{SH'^) domain.) The function g: SH'^ SH'^ is given, 

for each shw G SH'^ with shw {cl,sh), by g{shw) (^p{l cl), p{T{shw))) . 
Then g is an upper closure operator for SH'^ [ 19 ]. We will use the notation 
shwi =g shw2 to denote g{shw\) = g{shw2) and shw\ Qg shw2 to denote 
g{shwi) C g{shw2). 

The ordering is used for modeling the relative precision between widenings 
in Section 4. When shwi =g shw 2, shw\ and shw 2 behave the same way as far 
as representing pair-sharing and groundness is concerned. 

Proposition 1 . If shw G SH'^ , then X{shw) =pX(^g{shw)) and shw =g |s/iw|. 

Definition 11 . (Operations over SH'^ .) For each {cl, sh), {cli, shi) G SH'^ , 
i = 1 , 2 , and each V G p{VI), the functions rel“,rel'": p{VI) x SH'^ — > SH'^ 
and U'",bin": SH'^ x SH'^ SH'^ , the functions sbin*: SH'^ SH'^ and 
amgu* : SH''^ x Bind —>• SH''^ are defined as follows: 

tbT^ { y, {cl , sh)~) (rel(lA, c/), rel(lA, s/i)), 

fd'^{V,{cl,sh)) {fd^fiV,cl),fd{V,sh)), 

{ell, shi) U* (c/2, s/12) (c/i U c/2, shi U s/12) , 
bin“((c/i, shi), (c/2, s/12)) (bin(c/i, c/2) U bin(c/i, s/12) U bin(s/ii, c/2), 

bin(s/ii, s/12)), 

sbin“((c/, s/i)) bin“((c/, sh), {cl, sh)) 

= (sbin(c/) U bin(c/, s/i), sbin(s/i)) , 

a,mgvd {shw , X = t) rel'"(14t, shw) 

U“ bin* ^sbin*(rel“(T4, shw)), 

sbin*(rel*(Vt, shw))'j , 
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where Vx = {a;}, Vt = vars{t), and Vxt = 14 U V*. 

The next two theorems, proven in [19], state the correctness of amgu* and 
that normalization does not affect the correctness or precision of amgu*. 

Theorem 2. For each shw G SFl'^ and each {x = t) G Bind, 



In general, is not a congruence for amgu* and precision may be lost 
when the first component of shw is non-empty and the second component of 
shw contains redundant elements. Further work on this aspect is ongoing. 

In Eq. (1), the basic amgu operation is defined using the SFl^ domain. How- 
ever, when we have freeness and linearity information it has been proven that 
we can avoid one or both of the self-bin-unions occurring as components of the 
binary union operation. The question arises as to whether this optimization can 
be applied when we have the amgu* operation for the SH'^ domain. That is, 
can we avoid the corresponding sbin* operations under the same linearity and 
freeness conditions? The answer is yes, we can generalize Theorem 2 and show 
that such an optimization is sound. However, the optimization may lose precision 
and further work on this aspect is ongoing. 

3.1 Optimizations 

We can optimize the computation of amgu* in two ways. To explain these, 
we need some extra notation. Let: shw = (cl,sh) G 5H* and x = t G Bind', 
Vx = {x}, Vt = vars{t), and Vxt = K U Vt, shwx = (clx,shx) = rel'^ {Vx, shw) 
and shwt = {clt, sht) = rel*(Vi, shw)', and, finally. 



Theorem 4. If neither shwx = (0, 0) nor shwt = (0, 0), then 
reV{Vxt,shw) = {rel{Vxt,cl) U cl' , rel{Vxt , sh)) , 
where cl' C | cZ„i- 

Theorem 5. Suppose that Cx = [J clx, Ct = [j clt, S'a; = U 5't = IJ sht, 
Axt = bin(sbin(s/ia;),sbin(s/it)), Bxt = hin{shx, sht), Bx = hin{clx, shx), and 
Bt = hm{clt, sht). Let also 



amgu(X(s/iw;), a: = t) Cp X {a.mgV'' {shw , x = f)) . 

Theorem 3. For each shw G SH''' and each {x = t) G Bind, 
a.mgvf' {shw , X = t) =p amgu* ( | s/iw | , a; = t). 



shWrel = {clrel, shrel) = blu* (sblu* (s/lW a; ) , Sblll* (s/lW ( ) ) . 




1 



{{Cx U Ct U U S'*}, 0) , if clx yf 0, eh ^ 0; 

{{Cx U St} U Ha, U Bxt, Axt), if clx + 0, dt = 0; 

({Ct U Ha,} U Ht U Bxt, Axt) , if clx = 0, clt ^ 0; 



{0,Axt) 



if clx = 0, clt = 0. 
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Then, 



ShW,,, =g 



shwZ\ 

( 0 , 0 ), 



if shwx ^ (0, 0), shwt (0, 0); 
otherwise. 



Both these results are proven in [19]. These can then be combined to provide 
the following optimization (also proven in [19]) for the computation of amgu*. 

Corollary 1. Assuming the notation used in Theorem 5, then 



amgu'"(s/iw, X = t) 

_ f (rel(14t,cZ),rel(14i,s/i)) shwZi, if shwx ^ {0,0),shwt ^ (0,0); 

^ I rel'"(14t, s/iw), otherwise. 

Observe that this result applies to the basic amgu* operation as given in 
Definition 11. When one or both of the self-bin-unions here is omitted due to 
available freeness and linearity information, then =g in the corollary becomes 
Cg and we may lose further precision. Further work on this subject is ongoing. 



4 Widening Set-Sharing 

We can now define a family of unary widenings over SH'^ . 

Definition 12. (Widening for SH'^.) The function V: SH'^ SH'^ is a 
widening for SH'^ if, for each shw € SH'" , we have shw V shw. 

The following result establishes the safety of such widening operators. 

Theorem 6. For each shw S and each (x = t) G Bind we have 

amgvZ{shw,x = t) amgvZ (y shw , x = f). 

The obvious corollary is that any analysis using these widenings, possibly 
a different widening at each step of the analysis, is correct. After widening we 
always normalize the resulting description to provide a smaller representation. 
Moreover, it is also shown in [19] that similar results hold for each of the com- 
ponent operators, such as bin™, for amgu". Thus we can (and do) safely widen 
and normalize within the actual computation of amgu”. The analyzer has the 
freedom of using whichever widening suits its current needs. Those needs can be 
dictated by a number of heuristics. Of course, really useful widenings are guarded 
by some applicability condition. The simplest conditions are those based on the 
cardinality of the sets in the SH''^ description. For example, for each widening 
V and for suitable choices of / : ^ N and n S N, one can define 

^ ,, ,, def fv(cZ,s/i), A f{# cl,# sh) > n, 

VUcl.M) . ; 
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We order the widenings in the obvious way. If V i and V 2 are two widenings 
and for all s/iw, Vi(s/iw) V2(s^w), then let Vi Cg V2, meaning that Vi is 
more precise than V2- 

At the top end of the scale of widenings we have two panic widenings. They 
are defined by 



sh) (^cl U s/i}, 0^ , 

sh) ({U U • 



The panic widenings are present in the China implementation, with very strict 
guards, only to obey the “never crash” motto: no real program we have access 
to makes them fire. 

At the other extreme we have very soft widenings. 

Definition 13. (Cautious widening.) A widening V: SH'^ SH'^ is called 
a cautious widening if, for each shw G SH'^ , 

X(V shw) =p T{shw). 

Thus, a widening is cautious if it is invariant with respect to the set-sharing 
representation. In particular, it never introduces new pair-sharings nor new sin- 
gletons in the description. However, information is lost as soon as the operations 
for the analysis given by Definition 11 are considered. For example, consider 
two elements of SH'^: shw\ { 0 ,{x,y,z,xy,xz,yz\') and shw2 {{xyz}, 0 ) 
so that we have I{shwi) =p I{shw2) but g{shwi) yf g{shw2). While sharing 
between y and 2: is not contemplated in rel”({a;}, s/iwi) = {fZi,{x,xy,xzY), the 
same does not hold for rel*({a:}, shw2) = shw2- 

A useful cautious widening is the gentle widening, defined as follows. Consider 
shw G SH*' , and let us define the undirected graph G {N, E) such that 
{ X I {x} G I{shw) } and if { (x, y) | {x, y} G T{shw),x, ?/ G iV, x yf ?/ }. 

Then 

shw ({Cl,... ,Ck},sh), 

where Ci, ... , Ck are all the maximal cliques of G. Note that, although the 
problem of enumerating all the maximal cliques of an undirected graph is NP- 
complete, this does not seem to be a problem for the graphs arising during the 
analysis of even the biggest real programs. For the experimentation we used 
the algorithm by Bron and Kerbosch [6], which is Algorithm in the ACM 
collection, even though more efficient algorithms are present in the literature. 

Of intermediate precision is the widening based on Fecht’s idea, which we 
will call Fecht’s widening. It is simply given by 

V^{cl, sh) {cl U sh, 0). 
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Goal-Independent 


Goal-Dependent 


P-tWSH-tF-tL 


P-tNSH-tF-tL 


P-tWSH-tF-tL 


P-tNSH-tF-tL 


Program 


NSP 


GV 


LV 


NSP 


GV 


LV 


NSP 


GV 


LV 


NSP 


GV 


LV 


aqua.c 


11147 


*406 


2757 


7 


7 


7 


16364 


*118 8 


2028 


7 


7 


7 


caslog 


6553 


*474 


1615 


7027 


*474 


1615 


11338 


*1739 


1062 


7 


7 


7 


oldchina 


2193 


285 


1166 


2193 


285 


1166 


3985 


802 


760 


7 


7 


7 


quot_an 


288 


37 


160 


288 


37 


160 


639 


159 


122 


646 


159 


122 



Table 2. Pos+ SH'^ + Free+ LinYS Pos+ SH^ + Free+ Linwsing vfoo- precision. 



This widening is not cautious. However, it does not introduce new pairs. As it 
can introduce new singletons, it may destroy ground dependencies, and this is 
why this kind of widening is better coupled with Pos. 



5 Experimental Evaluation 

For the experimental evaluation of the Fecht’s widening precision is com- 
pared with respect to the non-redundant sharing domain In fact, this 

approach is almost always as precise as the optimal one using SFl^ . 

For this and the following experiments, the widening was guarded by a size 
threshold of 100 on the second component (i.e., the normal sharing part). In other 
words, immediately before each abstract mgu operation the analyzer operated 
redundancy elimination, as usual. If after this the operand {cl,sh) was such 
that ^ sh > 100, then (cl,sh) was substituted by {cl, sh). Let us call this 
guarded widening vfoo- The results are reported in Table 2. Note that only the 
programs where the analysis with SH'^ gives different results from the analysis 
with SH^ are reported in the table. Thus, for all the programs in the test-suite, 
the analysis with using the (rather drastic) widening Vioo gives the same 
results obtainable (at a much higher cost) with SFI^, apart from those in Table 2. 
For aqua_c we obtain termination in reasonable time, as with Fecht’s technique 
but with higher precision. The same holds for the GD analysis of c as log and 
oldchina. However, while the GI analysis of oldchina is “optimal” (meaning 
“as precise as this is not the case for caslog. Non-optimality happens 

also for the GD analysis of quot_an. 

Obviously, vfoo is never less precise than Fecht’s domain. What is surprising, 
however, is that it is almost as efficient. The timings and the number of applica- 
tions of the widening are reported in Table 3 for all the programs such that at 
least one timing was above 0.4 seconds. The first observation to be made is that 
the widening comes into play only a few times on the test-suite. On average, it 
is safe to say that on 99.9% of cases the sharing-sets remain of reasonable size 
(100 groups or less in this experiment). Table 3 says that this definition of “rea- 
sonable” makes sense: for those programs where widening does not take place 
the difference in performance between Fecht’s domain and our with the 
Vioo widening is very limited. Analysis of aqua_c shows that limiting precision 
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1 Goal-Independent 


1 Goal-Dependent 


|P+DSH-tF-bL 


|P-bWSH-bF-bL 


|P-tDSH-tF-tL 


|P-bWSH-bF-bL| 


Program 


T 


T 


#w 


T 


T 


#w 


action 


0.1 


0.1 


1 


1.1 


1.4 


1 


aircraft 


0.2 


0.2 


0 


0.7 


0.7 


0 


aqua.c 


10.4 


10.9 


56 


48.6 


40.7 


3 


bmtp 


0.8 


0.9 


6 








bryant 


0.1 


0.1 


0 


0.6 


1.4 


1 


caslog 


2.0 


2.5 


17 


17.7 


19.2 


22 


chatSO 


0.9 


1.0 


2 


4.3 


4.9 


6 


chat_parser 


0.4 


0.4 


1 


1.7 


1.8 


1 


dpos_an 


0.2 


0.2 


0 


0.5 


0.8 


1 


eliza 


0.1 


0.1 


0 


0.2 


0.4 


1 


41g_sys 


3.3 


3.9 


23 








log_interp 


0.2 


0.4 


2 


0.7 


0.9 


1 


mixtus 


0.9 


0.9 


4 








oldchina 


1.2 


1.4 


11 


7.7 


8.3 


4 


parser _dcg 


0.2 


0.1 


0 


0.7 


0.6 


0 


peephole 1 


0.1 


0.1 


0 


0.4 


0.7 


1 


pets_an 


0.8 


0.9 


4 


4.5 


4.5 


1 


peval 


0.2 


0.3 


3 


0.4 


0.5 


1 


plaiclp 


0.7 


0.7 


3 








press 


0.1 


0.1 


0 


0.4 


0.7 


0 


quot_an 


0.3 


0.4 


0 


1.3 


1.7 


1 


read 


0.1 


0.1 


0 


0.3 


0.6 


1 


reg 


0.4 


0.4 


4 


0.4 


0.4 


1 


sdda 


0.1 


0.1 


1 


0.2 


0.4 


2 


sim 


0.2 


0.3 


2 


0.7 


0.8 


2 


simple_an 


0.1 


0.2 


0 


0.6 


0.9 


2 


slice 


0.6 


0.7 


2 








spsys 


0.6 


0.7 


5 








trs 


0.2 


0.3 


2 


0.5 


0.6 


1 


unify 


0.1 


0.1 


0 


0.5 


0.7 


0 



Table 3. Pos+ i SH + Free+ Lin vs Pos+ SH'" + Free+ Lin using Vfoo: timings 
(T) and number of (sharing) widenings (#W). 



may cost (less precision means more self-bin-unions to perform, thus even less 
precision, ... ). 

The results on the precision of Vfoo are so good that we are left with a 
ridiculous test-suite for checking how much we can improve by using a more 
cautious widening. Our experimentation showed that the gentle widening 
improves over vfoo only on quot_an. The same does, but at a lower price, a 
bigger widening V® that is defined as apart from the fact that singletons 
are disregarded. In other words, the undirected graph considered for V®, given 
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Goal-Independent 


Goal-Dependent 


571 ™, Vfoo 


571 ™, Vfoo 


SH” 


571 ™, Vfoo 


571 ™, Vfoo 


SHP 


Program 


T 


NSP 


T 


NSP 


T 


NSP 


T 


NSP 


T 


NSP 


T 


NSP 


aqua.c 


4.1 


10703 


15.0 


10899 


? 


7 


27.6 


15754 


37.8 


15754 


7 


7 


bryant 


0.2 


1066 


0.2 


1066 


0.2 


1066 


1.0 


1033 


1.0 


1781 


0.9 


1781 


caslog 


2.1 


6506 


4.5 


6539 


744.4 


7027 


17.9 


11054 


21.8 


11054 


7 


7 


chat 80 


0.6 


2536 


1.1 


2536 


9.1 


2536 


4.4 


3923 


7.7 


3926 


285.7 


5111 


eliza 


0.1 


49 


0.1 


49 


0.1 


49 


0.3 


109 


0.5 


113 


0.5 


113 


lg_sys 


2.7 


7328 


7.9 


7334 


725.1 


7334 














oldchina 


0.9 


2187 


2.6 


2189 


5.0 


2193 


5.9 


3936 


9.5 


3936 


7 


7 


pets_an 


0.6 


2525 


1.3 


2563 


19.8 


2569 


3.7 


4664 


5.3 


4664 


1006.5 


4710 


quot_an 


0.3 


288 


0.4 


288 


0.3 


288 


1.4 


639 


3.2 


646 


3.1 


646 


simple_an 


0.1 


373 


0.1 


373 


0.1 


373 


0.8 


572 


1.3 


639 


17.6 


639 


slice 


0.6 


426 


0.9 


428 


0.8 


428 















Table 4. SH'^ with Vfoo iSi/* with Vfoo plain SH^: timings and precision. 



shw G SH'^, is G (N,E) such that iJ { (x,y) | {x,y} G I{shw),x y} 

and N { X \ {x, y) G E or {y , x) G E } . 

Now, suppose we perform sharing analysis without combining the sharing 
domain with Pos. Then using a more or less precise widening makes a difference. 
In Table 4 are reported the results (fixpoint time and number of definitely not- 
sharing pairs) for SH'^ with Vfoo, SH'^ with Vfoo> plain SH^. The GD 
analysis of bryant is particularly eloquent example of the superiority of more 
cautious widenings when Pos is not used. 



6 Conclusion 

We believe we have made a significant step forward towards the solution of 
the problem of practical, precise, and efficient sharing analysis of (constraint) 
logic programs. We have studied a new representation for set-sharing that allows 
for the incorporation of a variety of widenings. Extensive experimentation has 
shown that one of these widenings, which is based on an idea of C. Fecht, pro- 
vides seemingly hard to beat precision and performance, when combined with 
Pos. When this combination is not performed, we have also shown that “more 
cautious” widenings offer more precision at an acceptable extra-cost. 

We are now studying how to increase precision of the analysis beyond the 
limits of set-sharing. This includes more precise tracking of freeness and linearity, 
and the efficient incorporation of structural information into the analysis domain. 
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